For more technologies supported by Talend, see Talend components.
In this scenario, you will standardize some long descriptions of customer products by matching the input flow with the data contained in an index. This scenario explains how to use Index rules to tokenize product data and then check each token against an index to extract exact match.
For this scenario, you must first create an index by using a Job with the tSynonymOutput component. You need to create indexes for the brand, range, color and unit of the customer products. Use the tSynonymOutput component to generate the indexes and feed them with entries and synonyms. The below capture shows an example Job:
Below is a sample of the generated indexes for this scenario:
Each of the generated indexes has strings (sequences of words) in one column and their corresponding synonyms in the second column. These strings are used as a reference data against which the product data, generated by tFixedFlowInput, will be matched. For further information about index creation, see tSynonymOutput.
In this scenario, the generated indexes are defined as context variable. For further information about context variables, see Talend Studio User Guide.