Creating the parsing rules - 6.5

Standardization

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Standardization components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components
Design and Development > Third-party systems > Data Quality components > Standardization components
EnrichPlatform
Talend Studio

Procedure

  1. Double-click the tStandardizeRow component to display its Basic settings view.
  2. From the Column to parse list, select product.
  3. In the Conversion rules table, define a basic rule and an advanced rule as the following:
    • Click twice on the [+] button to add two columns. Name the first as "Amount" and the second as "LiquidAmount".

    • Select Format as the type for the basic rule, and define it to read "INT WHITESPACE* WORD".

    • Select RegExp as the type for the advanced rule, and define it to read "\\d+\\s*(L|ML)\\b".

      The advanced rule will be executed after the basic ANTLR rule. The "Amount" rule will tokenize the amounts in the three strings, it matches any word with a numeric in front of it. Then the RegExp rule will check each token created by ANTLR against a regular expression.

  4. Click the Generate parser code in Routines button in order to generate the code under the Routines folder in the DQ Repository tree view of the Profiling perspective.
    This step is mandatory, otherwise the Job will not be executed.
  5. In the Advanced settings view, leave the options selected by default in the Output format area as they are.
    The Max edits for fuzzy match is set to 1 by default.
  6. Double-click the tLogRow component and select the Table (print values in cells of a table) option in the Mode area.