tLoqateAddressRow properties - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Component family

Data Quality

 

Function

tLoqateAddressRow parses, verifies, cleanses, standardizes, transliterates, and formats international addresses.

This component uses the Loqate Global Knowledge Repository containing definitive address and geographic reference data for over 240 countries in multiple languages and character sets.

tLoqateAddressRow uses the Q4, 2012 release.

Purpose

tLoqateAddressRow enables you to parse structured or unstructured text into labeled address, it automatically puts address components into the correct address field.

You can compare address data against reference data to ensure that it is accurate and complete. You can correct spellings, add missing address data such as city, city area, region or postcode, and enrich address with other elements such as latitude longitude and other relevant data.

Basic settings

Schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: You create the schema and store it locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: see Talend Studio User Guide.

 

Edit Schema

Click the [...] button and define the input and output schema of the address data.

Make sure to define in the output schema all columns necessary to output the formatted data you want to get from tLoqateAddressRow.

 

Input Address

Address field: add lines to the table and select from the component predefined list the fields that will hold the input address.

tLoqateAddressRow provides a long list of individual fields because some countries have more complex addressing structures than others. For further information about the input fields, see Address fields in tLoqateAddressRow.

Input Column: add lines to the table and select from the list the columns that hold the input address. The input schema can have one or multiple columns and can have columns that do not represent address data.

 

Output Address

Address field: add lines to the table and select from the component predefined list the fields that will hold the output address. The component will map the values of these fields to the output columns you set in the table.

tLoqateAddressRow provides a long list of individual fields because some countries have more complex addressing structures than others. For further information about the output fields, see Address fields in tLoqateAddressRow.

Output Column: add lines to the table and select from the list the columns that will hold the output address.

If you select to have an output column in the Output Address table that has the exact name of an input column, the input column value will be overwritten by the value given by tLoqateAddressRow.

In the output schema, there are two output standard columns that are read-only:

-STATUS: returns the status of processing input addresses. For further information about process status, see Process status in tLoqateAddressRow.

-ACCURACYCODE: returns the verification code for the processed address. For further information about what values this code is made up of and the implications of each segment, see Address verification codes in tLoqateAddressRow.

 

Loqate Data Path

Set the path to the Loqate Global Knowledge Repository provided by Loqate and installed locally.

You must order and download the Loqate Local API and the Global Knowledge Repository from http:// www.loqate.com/. tLoqateAddressRow uses the Q4, 2012 release.

Advanced settings

Server options

Set the server options as the following:

-Address Line Separator: define the string which will separate the output address components within the output address fields. The default separator is the line break string (<BR>).

-Default Country: select the country name for which the ISO 3166-1 alpha-3 code should be used when parsing data and if no identifiable country is found in an input record.

-Forced Country: select the country name for which the ISO 3166-1 alpha-3 code should be used for all input records when parsing data.

-Output Script: use this option to transliterate the output address.

Select Latin to encode the parsing results in Latin, or western characters.

Select Native to encode the parsing results using the country script.

Below is a list of the character sets (scripts) and languages tLoqateAddressRow can transliterate:

Latn - Latin (Western characters),

Cyrl - Cyrillic (Russia),

Grek - Greek (Greece)

Hebr - Hebrew (Israel),

Hani - Kanji (Japan),

Hans - simplified Chinese (China),

Arab - Arabic (United Arab Emirates),

Thai - Thai (Thailand),

Hang - Hangul (South Korea),

Native - output in the native script wherever possible.

-Minimum match score: specify the minimum match score a record must reach in order not to be reverted. The default value is zero, and valid values are between zero and 100.

This option is very helpful when you want to get, in the output fields, the input data if a specific level of verification (minimum match score) was not reached.

tStatCatcher Statistics

Select this check box to collect log data at the component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is an intermediary step. It requires an input and output flows.

Limitation

n/a