Filtering tasks using patterns - 7.1

Talend Data Stewardship User Guide

Version
7.1
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Stewardship
Content
Administration and Monitoring > Managing users
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
Data Quality and Preparation > Managing semantic types

The Pattern tab of the profiling area shows a graphical representation of the type and number of characters your data is made of. It enables you to see how the records are structured, with either a word, or character granularity.

It is also a quick and easy way to apply filter on your data.

When selecting the content of a column, a horizontal bar chart displays the repartition of the different patterns which represent the type and number of characters or words the data is made of.

You can switch between the character-based or word-based patterns from the Pattern tab except for numeric data for which only character patterns are computed.

Analyzing word-based patterns would be an efficient way to detect data quality issues in first names or last names, for example. Names that are not exclusively made of words, with punctuation or numbers, will immediately stand out. On the other hand, character-based patterns would be more suited in the case of structured data, such as client identifiers or account numbers. You will be able to tell from the chart if the number of characters or digits is not the right one.

Procedure

  1. Open a task list in one of the campaign defined in Talend Data Stewardship.
  2. Click the header of a column to select its content, the EMAIL column in this example.
  3. Select PATTERN in the right-hand panel.
    All word patterns which represent the values in the EMAIL column are computed and displayed.
  4. Click the word pattern which values you want to filter, or hold the SHIFT or Ctrl key and select multiple patterns to list the corresponding tasks.
    The filter detail is added on top of the list and a switch to toggle filter is displayed on the top left corner.

    All email addresses which have the [word].[word] and [word] formats are listed.

  5. To switch to the character patterns of the email addresses, click the A icon in the top right corner of the PATTERN view.
  6. To remove the filter(s) you defined, place your pointer on the top right corner of the list and click the trash icon.