tQASAddressRow - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

This component will be available in the Palette of Talend Studio on the condition that you have subscribed to one of the Talend Platform products.

The address management components discussed here are the result of Talend collaboration with Experian QAS, one of the world leaders for global address data quality.

For more information about the enterprise and its software tools, visit http://www.qas.com.

tQASAddressRow properties

Component family

Data Quality

 

Function

tQASAddressRow verifies columns in an address. It iterates on each row and reads input addresses against the QuickAddress data.

tQASAddressRow uses QAS Pro Web 5.16 on Linux and 5.86 on Windows.

Purpose

tQASAddressRow corrects any formatting or spelling errors and gives the verification status for each row since the address may not always have enough information to be matched to a single deliverable result in the QuickAddress data.

For more information about the verification status, see QuickAccess verification levels (verification status).

Basic settings

QAS WSDL url

Enter the URL for the QuickAdress XML document (provided by Experian QAS).

 

Country

Select from the list the country corresponding to your input addresses.

 

Schema and Edit Schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: You create the schema and store it locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: see Talend Studio User Guide.

 

Column to analyze

Select from the list the address column you want to analyze.

Advanced settings

tStat Catcher Statistics

Select this check box to collect log data at the component level.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is an intermediary step. It requires an input flow as well as an output.

Limitation

n/a

QuickAccess verification levels (verification status)

An address can be matched to one of six verification levels. These verification levels are directly returned by the tQASAddressRow component to indicate the returned match type of the checked addresses. Moreover, the output flows of the other QAS components are adapted to match to one or more of the below verification levels.

The six QuickAddress verification levels are:

  • Verified: The address searched upon is matched to a single deliverable address in the QuickAddress data. The verified result may be slightly different from the address entered and searched upon, as any formatting and spelling errors will have been corrected, and any missing elements will have been added. When this match type is returned, no further user interaction is required.

  • Interaction required: The address searched upon is matched to a single deliverable address in the QuickAddress data, although there is less confident about the match than for the verified level above, and therefore user interaction is recommended to confirm that it is the correct address.

  • PremisesPartial: The address searched upon is not matched to a complete deliverable result in the QuickAddress data, and instead has been matched to a partially-complete address.

    For example, the address is matched to a premises in the QuickAddress data, but a complete deliverable match could not be found: "63 Southerton Road, London" rather than "Flat A, 63 Southerton Road, London".

  • StreetPartial: The address searched upon is not matched to a complete deliverable result in the QuickAddress data, and instead has been matched to a partially-complete address.

    For example, the address is matched to a street in the QuickAddress data, but a complete deliverable match could not be found: "Kew Road, Richmond" rather than "88 Kew Road, Richmond".

  • Multiple: The address searched upon is not matched to a single deliverable result in the QuickAddress data, and instead has matched equally to more than one result.

    For example, the address is matched to two equally valid addresses that can only be distinguished by address information that has not been provided in the search.

    user interaction is therefore necessary to select the required address.

  • None: The address searched upon could not be matched to any deliverable results in the QuickAddress data. When this match type is returned, no address verification is possible and the submitted address should be used instead.

Scenario: Editing addresses and giving the verification status

Below is a five-component Job created in Talend Studio.

This Job:

  • reads an input csv file that holds some client-related information,

  • uses the tMap component to concatenate the three fields Address, Postal, and City from the incoming data flow in one output column: Edit_Address,

  • uses the tQASAddressRow component to analyze the output column Edit_Address and gives the verification status of all edited addresses,

  • uses a tFilterRow component to output only the addresses which status is not equal to None.

  • and finally displays the correct formatted address along with their verification status on the console.

In this scenario, we have already stored the input schema of the input file in the Repository. For more information about storing schema metadata in the Repository tree view, see Talend Studio User Guide.

Setting up the Job

  1. In the Repository tree view, expand Metadata and the file node where you have stored the input schemas and drop the relevant file onto the design workspace.

    The [Components] dialog box displays.

  2. Select tFileInputDelimited from the list and click OK to close the dialog box.

    The tFileInputDelimited component displays on the workspace. The input file used in this scenario is called address_template, which is a csv file that holds some French client personal information.

  3. Drop the following components from the Palette onto the design workspace: tMap, tQASAddressRow, tFilterRow, and tLogRow.

  4. Connect tFileInputDelimited to tMap and tQASAddressRow to tFilterRow using Main links, tMap to tQASAddressRow using the New Output link, and tFilterRow to tLogRow using the Filter link.

Configuring the components

  1. Double-click the tMap component to open the Map Editor and concatenate the Address, Postal, and City fields from the incoming data flow in one output column: Edit_Address.

    When done, click OK to close the Map Editor and propagate the changes to the next component.

  2. Double-click the tQASAddressRow component to display its Basic settings and define its properties.

  3. In the QAS WSDL url field, enter the URL for the QuickAdress XML document (provided by Experian QAS).

  4. On the Country list, select the country corresponding to your input addresses, France in this example.

  5. If needed, click Edit schema to view the input and output data flow. The output schema should include the Edit_Address column that hold the Address, Postal, and City initial input columns.

    The output schema of any of the QuickAddress components depends on the selected country in the Country list since every country has different address norms.

    Click OK to close the dialog box.

  6. On the Column to analyze list, select Edit_Address.

  7. Double-click the tFilterRow component to display its Basic settings view and define its properties.

  8. In the Conditions area, click the plus button to add one condition to the output flow and in the corresponding table cells:

    -select the input column you want to operate on,

    -select the needed function on the list,

    -select the operator to bind the input column with the value,

    -type in between the quotes the address value to be filtered.

    In this example, we want to exclude the addresses which status is equal to None.

  9. Double-click the tLogRow component to display its Basic settings and define its properties.

    In this example, and for clarity purposes, we want the result to display on the console in a separate key/value tabular list for each row.

Executing the Job

  • Save your Job an press F6 to execute it and display the result on the console.

    In the above result samples, the tQASAddressRow reads the input rows, corrects and formats the addresses, gives the result in the Edit_Address row, and gives the verification status in the Status row.