tAddressRowCloud Standard properties - 7.1

Address standardization

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Standardization components > Address standardization components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Address standardization components
Design and Development > Third-party systems > Data Quality components > Standardization components > Address standardization components
EnrichPlatform
Talend Studio

These properties are used to configure tAddressRowCloud running in the Standard Job framework.

The Standard tAddressRowCloud component belongs to the Data Quality family.

This component is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.

Basic settings

Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

 

Built-In: You create and store the schema locally for this component only.

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Edit Schema

Click the [...] button and define the input and output schema of the address data.

The output schema of tAddressRowCloud proposes several read-only address columns including a VerificationLevel column which provides you with a verification status of the processed address. The verification levels in this column are defined by Talend . For further information, see Address verification levels in tAddressRowCloud.

Also some of the output columns could be empty depending on what address provider you select in the component basic settings when executing the Job.

Address Provider

Select from the list the provider of the reference data against which you want to validate and format input addresses.

The list of address providers includes Google, Loqate, QAS and MelissaData.

License/API key

Enter the license or the API key provided by the address provider you select from the list. You must visit the provider website, register and get the license/API key.

When you select Google as a provider, the component uses the Google Places API. You must generate the key from the Google Developer Console at https://developers.google.com/console/help/new/ and set the key in this field.

Processing Mode

This option is applied only to the Loqate provider.

Select from the list the mode of address validation you want to have:

-Verify and Geocode (selected by default): with this mode, the component standardizes and corrects addresses and enriches them with latitude and longitude information.
Note:

Combining address verification and geocoding will cost extra credits. For further information, see Cloud Price Card.

-Verify only: with this mode, the component standardizes and corrects addresses without enriching them with latitude and longitude information.

Country

This option is applied only to the QAS provider.

Select from the list the country corresponding to your input addresses.

When you select QAS as a provider, the component uses the QAS Pro OnDemand service. For further information about Experian address verification, see the product sheet at https://www.edq.com/globalassets/product-sheets/address-verification.pdf.

QAS OnDemand username

This option is applied only to the QAS provider.

Enter the username you can find in the license provided by QAS.

You can check your username from the QAS OnDemand portal.

Password

This option is applied only to the QAS provider.

Enter the password you can find in the license provided by QAS.

You can check your password from the QAS OnDemand portal.

Use security mode to connect

Select this check box to connect to the Cloud in a secure mode. This may have a slight impact on performance.

This check box is not available with all address providers.

Mapping

Address field: add lines to the table and select from a predefined address list the fields that will hold input addresses.

The address list includes the following columns for all address providers: Address, PostalCode, Locality, AdministrativeArea and Country.

Input Column: add lines to the table and select from the list the columns that hold addresses from input schema. The input schema can have one or multiple columns and can have columns that do not represent address data.

Use Additional Output

This option is not available for the QAS provider.

Select this check box and use the Output Mapping table to add more address columns to the output schema:

Address field: add lines to the table and select from a predefined address list the fields of the extra information you want to output.

These predefined address fields vary according to the provider you select from the Address Provider list. For further information about the additional address fields, check the provider website.

Output Column: select from the list the columns that will hold the additional addresses information. You must first add these additional columns to the tAddressRowCloud output schema through the Edit Schema button.

tAddressRowCloud maps the values of the address fields to the output columns in the Output Column.

If you select to have an output column in the Output Address table that has the exact name of an input column, the input column value will be overwritten by the value given by the component.

Die on error

Select the check box to stop the execution of the Job when an error occurs.

Clear the check box to skip any rows on error and complete the process for error-free rows. When errors are skipped, you can collect the rows on error using a Row > Reject link.

Advanced settings

Fields in this view will vary according to the address provider you select in the basic settings view.

-Address Line Separator: define the string which will separate the output address components within the output address fields.

If you keep the default option, Default in this field, the component uses the line separator according to the address provider you select: for example, it uses the line break string (<BR>) with Loqate and ; with MelissaData.

-Default Country: select the country name for which the ISO 3166-1 alpha-3 code should be used when parsing data and if no identifiable country is found in an input record.

-Forced Country: select the country name for which the ISO 3166-1 alpha-3 code should be used for all input records when parsing data.

-Output Script: select the transliterate language of the output address.

The script list differs according to the address provider you select.

When the address provider is Loqate or MelissaData:

If you keep the default option, Not set in this field, the component checks the input data and decide to use Native or Latin according to whether the bigger portion of input is Native or Latin.

Select Latin to encode the parsing results in Latin, or western characters.

Select Native/Match input to encode the parsing results using the country script wherever possible.

The Native/Match input script includes the following supported character sets (scripts) and languages tAddressRowCloud can transliterate:

Cyrl - Cyrillic (Russia),

Grek - Greek (Greece)

Hebr - Hebrew (Israel),

Hani - Kanji (Japan),

Hans - simplified Chinese (China),

Arab - Arabic (United Arab Emirates),

Thai - Thai (Thailand),

Hang - Hangul (South Korea),

-Minimum match score: set the minimum match score a record must reach in order not to be reverted. The default value is zero, and valid values are between zero and 100.

This option is very helpful when you want to get, in the output fields, the input data if a specific level of verification (minimum match score) was not reached.

-Minimum interval between two queries (milliseconds): set in millisecond the minimum wait period between two queries.

-Limit of retrying the same query in case it fails (times): set the number of times a query should be retried in case of failure.

-Interval between two retries of the same query (milliseconds): set in millisecond the minimum wait period between two tries of the same query.

-Delay before forcing the termination of the query executor (seconds): set in seconds the wait period before forcing the query executor to shut down.

tStat Catcher Statistics

Select this check box to collect log data at the component level.

Global Variables

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

Usage rule

This component is usually used as an intermediate component, and it requires an input component and an output component.

This component enables you to create a data flow, using a Row > Main link, and to create a reject flow with a Row > Reject link filtering the data in error.