tMultiPatternCheck properties - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Component family

Data Quality

 

Function

tMultiPatternCheck checks all existing data in multiple columns against a given Java regular expression.

Purpose

tMultiPatternCheck can give two output flows: Matching Data and Non-Matching Data. The first collects all data that match a given pattern, and the second collects all data that do not match the pattern. You can then implement any required corrections.

Basic settings

Schema and Edit schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: You create the schema and store it locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: see Talend Studio User Guide.

 

Logical operator used to combine check conditions

In the case you want to combine the conditions you set on columns, select from this list the combine mode you want to use.

 

Columns to check

Set a regular expression for each of the analyzed columns.

-Column: list of the analyzed columns.

-Check pattern: Select from the list the pattern against which you want to check the column data.

These patterns are retrieved from the DQ Repository of your studio. The list includes the system and user-defined patterns.

If you want to customize the data quality pattern against which to check the column, select Custom from the pattern list.

-Custom Pattern: enter your own customized regular expression if you have selected Custom in the Check Pattern column.

-Is Case sensitive: select the check boxes of the column name where you want to consider, when doing the pattern check, lower and upper cases.

-Check: select the check boxes of the column(s) you want to check against the defined patterns.

-Message: leave this column empty to have automatic messages about which pattern has invalidated the data row and caused it to be rejected.

You can also enter your own personalized message to enrich the Job result with information about the patterns that cause the row to be rejected.

Advanced settings

tStatCatcher Statistics

Select this check box to collect log data at the component level.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

NB_LINE_OK: the number of rows matching a given pattern. This is an After variable and it returns an integer.

NB_LINE_REJECT: the number of rows not matching a given pattern. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is an intermediary step. It requires an input flow as well as an output.

Limitation

If you use a pattern in a Job, you cannot modify the pattern in the Pattern Editor. Any changes made in the Pattern Editor are not propagated to the Job.