Recuperating matching and non-matching rows - 6.4

Talend Data Management Platform Studio User Guide

EnrichVersion
6.4
EnrichProdName
Talend Data Management Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

When you add patterns to the analysis of a set of columns (simple table analysis), the result chart will show the percentage of the values in all the columns that match all used patterns and not only one of them. After the execution of the analysis of a set of columns, you can generate ready-to-use Jobs that will recuperate the matching/non-matching rows and write them in output files or databases.

Prerequisite(s): An analysis of a set of columns that uses patterns has been created and executed in the Profiling perspective of the studio. For further information, see How to create an analysis of a set of columns using patterns.

To generate a Job that recuperates matching and non-matching rows in the analyzed columns, do the following:

  1. Follow the steps outlined in How to create an analysis of a set of columns using patterns to create a simple table analysis that uses different patterns.

  2. Execute the column analysis.

  3. In the Analysis Results view, click All Match to open the corresponding view.

    The generated chart is a single bar chart for the totality of the used patterns. This chart shows the number of the rows that match and those that do not match "all" the patterns and is accompanied with a table that details the matching results.

  4. Right-click the pattern line in the All Match table and select Generate an ETL Job to handle rows. The Integration perspective opens on the generated Job.

    This Job uses the Extract Transform Load process to write in two separate output files the valid/invalid rows of the analyzed columns that match/does not match "all" the patterns.

  5. If required, use different output components to recuperate the valid or invalid rows in different type of files or databases.

  6. Save your Job and press F6 to execute it. The valid and invalid rows of the analyzed columns are written in the defined output files.