Recuperating valid and invalid rows in a column analysis - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-13
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

You can generate a ready-to-use Job on the results of a column analysis. This Job recuperates the valid/invalid rows or both types of rows and writes them in output files or databases.

Before you begin

A column analysis that uses patterns has been created and executed.

Procedure

  1. Follow the steps outlined in Defining the columns to be analyzed and Adding a regular expression or an SQL pattern to a column analysis to create a column analysis that uses a pattern.
  2. Execute the column analysis.
  3. In the Analysis Results view, click Pattern Matching under the name of the analyzed column.

    The generated graphic for the pattern matching is displayed accompanied with a table that details the matching results.

  4. Right-click the pattern line in the Pattern Matching table and select Generate Jobs.

    The Job Selector dialog box is displayed.

    When you analyze the column using a pattern that is defined for a specific database, you will be able to generate ELT Jobs.
    When you analyze the column using a pattern that is defined for the Java or the Default language, you will be able to generate an ETL Job.
  5. In the dialog box, select:
    Option To...
    generate an ELT job to get only valid rows to generate a Job that uses the Extract Load Transform process to write the valid rows of the analyzed column in an output file.

    This option is not available for the Amazon Redshift database.

    generate an ELT job to get only invalid rows to generate a Job that uses the Extract Load Transform process to write the invalid rows of the analyzed column in an output file.

    This option is not available for the Amazon Redshift database.

    generate an ETL job to handle rows to generate a Job that uses the Extract Transform Load process to write the valid/invalid rows of the analyzed column in output files.
    In this example we select the generate an ETL job to handle rows option to generate a Job that will output in two separate output files the valid and invalid email rows.
  6. In the dialog box, click Finish to proceed to the next step.
    The Integration perspective opens on the generated Job.
  7. Optional: Use different output components to recuperate the valid/invalid rows in different type of files or in databases.
  8. Save your Job and press F6 to execute it.
    The valid and invalid email rows of the analyzed column are written in the defined output files.
    The results in the retrieved files may depend on the ETL or ELT mode. In the ETL mode, the data is retrieved against Java regular expressions while in the ELT mode, the data is retrieved against the appropriate database regular expressions. The regular expression engines work differently in Java and in the DBMS, hence the result may differ, even more if you defined different regular expressions in the pattern editor.