tQASBatchAddressRow - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

This component will be available in the Palette of Talend Studio on the condition that you have subscribed to one of the Talend Platform products.

The address management components discussed here are the result of Talend collaboration with Experian QAS, one of the world leaders for global address data quality.

For more information about the enterprise and its software tools, visit http://www.qas.com.

tQASBatchAddressRow properties

Component family

Data Quality

 

Function

tQASBatchAddressRow verifies addresses in a column. It iterates on each row and reads input addresses against the locally-installed QAS Batch Application with the help of a Dynamic Library. The Dynamic Library file extension is .dll in Windows and .so in Linux.

The advantages of this component over tQASAddressRow is that it does not call a web service to be able to verify postal address data. This component uses QAS files to verify postal addresses and thus optimize performance, especially when dealing with large amounts of data.

For further information on installation and on configuration parameters, see QuickAddress Batch and Setting configuration parameters in the QAS files respectively.

tQASBatchAddressRow uses Batch 4.80 on both Linux and Windows.

Purpose

tQASBatchAddressRow corrects any formatting or spelling errors, adds missing data and gives the verification status for each row since the address may not always have enough information to be matched to a single deliverable result in the QAS files.

For more information about the verification status, see QuickAccess verification levels (verification status).

Basic settings

Schema

A schema is a row description, it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Sync columns to retrieve the schema from the previous component in the Job.

 

 

Built-in: You create the schema and store it locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: see Talend Studio User Guide.

 

Edit schema

Click the [...] button and define the input and output schema of the address data.

The output schema of tQASBatchAddressRow provides several read-only address columns. The output schema is different and depends on the selected country in the Country list since every country has different address norms.

 

Country

Select from the list the country corresponding to your input addresses.

If you want to have a global output schema, select Universal from this list.

 

Choose the address column

Select from the list the address column you want to analyze.

 

Specify the configuration file

Click the [...] button and browse to set the path to the configuration file, qaworld.ini.

Advanced settings

tStat Catcher Statistics

Select this check box to collect log data at the component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is an intermediary step. It requires an input flow as well as an output.

Limitation/prerequisite

Before being able to use this component, you must install the QAS Batch Application provided by Experian QAS.

Setting configuration parameters in the QAS files

After installing the QAS Batch as outlined in QuickAddress Batch, you must configure some parameters in the QAS files so that they match with the component output schema.

For Linux:

  1. Open the ~/.profile file in your home folder and add the following lines, modify them according to your extract location:

    # for QAS Batch JNI
    export PATH=$PATH:/path/to/qasbatch/apps  #the folder which contains qaworld.ini
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/jni_wrapper_folder 
  2. Configure the QAS Application as the following:

    • Add a new line at the end of ./apps/qalicn.ini and put a valid license.

    • Put valid files which contain country address data into the right folder, and configure qawserve.ini to add country support.

      There must be three elements for each country line: a short name, a full country name, and data path which can be relative or absolute.

For both Linux and Windows:

  • In the qaworld.ini file, configure the related country section for output schema.

    The example below shows the configuration for UK addresses. The AddressLine1 to AddressLine5 indicate address validation results, which correspond to the first five output columns of the tQASBatchAddressRow component.

Scenario: Editing addresses against QAS files and giving the verification status

Below is a three-component Job created in Talend Studio.

This Job:

  • generates random address information,

  • uses the tQASBatchAddressRow component to analyze the output columns and display the correct formatted address along with their verification status on the console,

Complete the following to design and execute the above scenario:

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tQASBatchAddressRow and tLogRow.

  2. Connect the component together using Main links.

Configuring the components

  1. Double-click tFixedFlowInput to display its Basic settings view and define the component properties.

  2. Click the [...] button next to Edit Schema to open a dialog box, and add one column: addr. Then click OK to close the dialog box.

  3. In the Mode area, select the Use Inline Table option, add three lines in the table by clicking the [+] button, and define the data for the input column, three address rows in this example.

  4. Double-click the tQASBatchAddressRow component to display its Basic settings and define the component properties.

  5. Click the [...] button next to Edit schema, if required, to view the input and output data flow. The output schema should include the addr column.

    The output schema of any of the QuickAddress components depends on the selected country in the Country list since every country has different address norms.

    Click OK to close the dialog box.

  6. Select from the Country list the country corresponding to your input addresses.

  7. Select from the Choose the address column list the address column you want to analyze, addr in this example.

  8. Click the [...] button next to the Specify the configuration file field and browse to the QAS configuration file installed locally.

  9. Double-click the tLogRow component to display its Basic settings view and select Table in the Mode area to display the Job execution result in table cells.

Executing the Job

  • Save your Job and press F6 to execute it and display the result on the console.

    In the result shown above, tQASBatchAddressRow reads input rows, corrects and formats addresses, gives the result in the ADDRESS and ZIP_CODE_CITY columns, and gives the verification status in the STATUS row. For further information on the status column, check the corresponding documentation at http://www.qas.com.