Recuperating valid and invalid rows in a column analysis

You can generate a ready-to-use Job on the results of a column analysis. This Job recuperates the valid/invalid rows or both types of rows and writes them in output files or databases.

Before you begin

A column analysis that uses patterns has been created and executed.

Procedure

Follow the steps outlined in Defining the columns to be analyzed and Adding a regular expression or an SQL pattern to a column analysis to create a column analysis that uses a pattern.
Execute the column analysis.
In the Analysis Results view, click Pattern Matching under the name of the analyzed column.

The generated graphic for the pattern matching is displayed accompanied with a table that details the matching results.
Right-click the pattern line in the Pattern Matching table and select Generate Jobs.

The Job Selector dialog box is displayed.

When you analyze the column using a pattern that is defined for a specific database, you will be able to generate ELT Jobs.

When you analyze the column using a pattern that is defined for the Java or the Default language, you will be able to generate an ETL Job.

In the dialog box, select:

Option	To...
generate an ELT job to get only valid rows	to generate a Job that uses the Extract Load Transform process to write the valid rows of the analyzed column in an output file. This option is not available for the Amazon Redshift database.
generate an ELT job to get only invalid rows	to generate a Job that uses the Extract Load Transform process to write the invalid rows of the analyzed column in an output file. This option is not available for the Amazon Redshift database.
generate an ETL job to handle rows	to generate a Job that uses the Extract Transform Load process to write the valid/invalid rows of the analyzed column in output files.

In this example we select the generate an ETL job to handle rows option to generate a Job that will output in two separate output files the valid and invalid email rows.

In the dialog box, click Finish to proceed to the next step.
The Integration perspective opens on the generated Job.
Optional: Use different output components to recuperate the valid/invalid rows in different type of files or in databases.
Save your Job and press F6 to execute it.
The valid and invalid email rows of the analyzed column are written in the defined output files.

The results in the retrieved files may depend on the ETL or ELT mode. In the ETL mode, the data is retrieved against Java regular expressions while in the ELT mode, the data is retrieved against the appropriate database regular expressions. The regular expression engines work differently in Java and in the DBMS, hence the result may differ, even more if you defined different regular expressions in the pattern editor.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here