tSynonymOutput Properties - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance
Data Quality and Preparation
Design and Development

Component family

Data Quality

 

Function

tSynonymOutput creates a Lucene index and feed it with entries as well as the related synonyms it receives.

For further information about how to access and manage the words and the reference entries (documents) of an existing synonym index using the synonym index editor, see the Talend Studio User Guide.

For further information about available synonym indexes, see the appendix about data synonym dictionaries in the Talend Studio User Guide.

Note

The synonym similarity computation is enhanced since the Studio version 5.1. If your indexes were created with version 5.0 or lower and you need to handle them using this enhanced computation method, you have to update these indexes by executing the IndexMigrator.jar file downloadable from: http://talendforge.org/svn/top/trunk/org.talend.dataquality.standardization.migration/dist/IndexMigrator.jar. The command to be used to run this jar file is:

java -jar IndexMigrator.jar <inputPath> <outputPath(optional)> 

Purpose

tSynonymOutput creates synonym indexes that some components like tStandardizeRow or tSynonymSearch can refer to when processing data.

Basic settings

Schema and Edit schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: The schema will be created and stored locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: see Talend Studio User Guide.

 

Index path

Type in or browse to the location where you want to create and store the synonym index. If the specified directory does not exist, the component will create it.

 

Operations

Select the index operation to be performed in directory given in the Index path field.

(Delete and) initialize an index: creates a new index and then fills it with the entries and the corresponding synonyms; if an index already exists, deletes it before creating a new one.

Insert new documents: inserts new entries and synonyms into the given existing index. Duplicates are not inserted.

Update existing documents and insert if not existing: updates existing entries and synonyms, and adds new ones to the given index.

Delete existing documents: deletes the entries with their synonyms if the same entries are identified in the incoming data flow from the preceding component.

 

Entry

Select the column you need to insert to create the entries of the given index. These entries are used as reference to any associated synonyms to be inserted alongside in this given index.

 

Synonyms

Select the column you need to insert to create the synonyms corresponding to different index entries.

 

Synonym separator

Type in the separator to be used to separate the synonyms of each index entry. By default, this separator is |.

Advanced settings

tStatCatcher Statistics

Select this check box to collect log data at the Job and the component levels.

Connections

Outgoing links (from this component to another):

Row: Main; Reject

Trigger: Run if; On Component Ok; On Component Error.

Incoming links (from one component to this one):

Row: Main; Reject

For further information regarding connections, see Talend Studio User Guide.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component needs incoming data from the preceding component for creating or updating indexes.