Extracting exact match by using Index rules - 6.5

Standardization

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Standardization components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components
Design and Development > Third-party systems > Data Quality components > Standardization components
EnrichPlatform
Talend Studio

This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

For more technologies supported by Talend, see Talend components.

In this scenario, you will standardize some long descriptions of customer products by matching the input flow with the data contained in an index. This scenario explains how to use Index rules to tokenize product data and then check each token against an index to extract exact match.

For this scenario, you must first create an index by using a Job with the tSynonymOutput component. You need to create indexes for the brand, range, color and unit of the customer products. Use the tSynonymOutput component to generate the indexes and feed them with entries and synonyms. The below capture shows an example Job:

Below is a sample of the generated indexes for this scenario:

Each of the generated indexes has strings (sequences of words) in one column and their corresponding synonyms in the second column. These strings are used as a reference data against which the product data, generated by tFixedFlowInput, will be matched. For further information about index creation, see tSynonymOutput.

In this scenario, the generated indexes are defined as context variable. For further information about context variables, see Talend Studio User Guide.