tUniservRTConvertName - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tUniservRTConvertName analyzes the name line against the context. For individual persons, it divides the name line into segments (name, first name, title, name prefixes, name suffixes, etc.) and creates the address key.

The component recognizes company or institution addresses and is able to provide the form of the organization separately. It also divides lines that contain information on several persons to separate lines and is able to recognize certain patterns that do not belong to the name information in the name line (customer number, handling notes, etc.) and remove them or move them to special memo fields.

Purpose

tUniservRTConvertName provides the basis for a uniform structuring and population of person and company names in the database as well as the personalized salutation.

tUniservRTConvertName properties

Component family

Data quality

 

Basic settings

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

Host name

Server host name between double quotation marks.

 

Port

Listening port number of the server between double quotation marks.

 

Service

The service type/name is "cname_d" by default. Enter a new name if necessary (e.g. due to service suffix), between double quotation marks. Available services:

Germany "cname_d"

Italy "cname_i"

Austria "cname_a"

Netherlands "cname_nl"

Switzerland "cname_ch"

Belgium "cname_b"

France "cname_f"

Spain "cname_e"

 

Use rejects

Select this option to separately output data sets from a certain result class of the onward name analysis. Enter the respective result class in the field if result class is greater or equal to.

If this option is not selected, the sets are still output via the Main connection even if the analysis failed.

If the option is selected, but the Rejects connection is not established, the sets are simply sorted out when the analysis failed.

Advanced settings

Analysis Configuration

For detailed information, please refer to the Uniserv user manual convert-name.

 

Output Configuration

For detailed information, please refer to the Uniserv user manual convert-name.

 

Configuration of not recognized input

For detailed information, please refer to the Uniserv user manual convert-name.

 

Configuration of free fields

For detailed information, please refer to the Uniserv user manual convert-name.

 

Cache Configuration

For detailed information, please refer to the Uniserv user manual convert-name.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

tUniservRTConvertName provides the basis for a uniform structuring and population of person and company names in the database as well as the personalized salutation.

Limitation

To use tUniservRTConvertName, the Uniserv software convert-name must be installed.

 

Scenario: Analysis of a name line and assignment of the salutation

This scenario describes a batch job that analyzes the person names in a file and assigns them a salutation.

The input file for this scenario is already saved in the Repository, so that all schema metadata is available.

Note

Please observe that the data from the input source must all be related to the same country.

  1. In the Repository view, expand the Metadata node and the directory in which the file is saved. Then drag this file into the design workspace.

    The dialog box below appears.

  2. Select tFileInputDelimited and then click OK to close the dialog box.

    The component is displayed in the workspace. The file used in this scenario is called SampleAddresses..

  3. Drag the following components from the Palette into the design workspace: two tMap components, tUniservRTConvertName, and tFileOutputDelimited..

  4. Connect tMap with tUniservRTConvertName first.

    Accept the schema from tUniservRTConvertName by clicking Yes on the prompt window.

  5. Connect the other components via Row > Main.

  6. Double-click tMap_1 to open the schema mapping window. On the left is the structure of the input file, on the right is the schema of tUniservRTConvertName. At the bottom lies the Schema Editor, where you can find the attributes of the individual columns and edit them.

  7. Assign the columns of the input source to the respective columns of tUniservRTConvertName. For this purpose, select a column of the input source and drag it onto the appropriate column on the right side. If fields from the input file are to be passed on to the output file, like the address fields or IDs, you have to define additional fields.

  8. Click OK to close the dialog box.

  9. Double-click tUniservRTConvertName to open its Basic Settings view.

  10. Fill in the server information and specify the country-specific service.

  11. Double-click tMap_3 to open the mapping window. On the left is the schema of tUniservRTConvertName and on the right is the schema of the output file.

  12. Click OK to close the window.

  13. Double-click tFileOutputDelimited and enter the details for the output file.