Scenario: Editing addresses against a MelissaData data file - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a three-component Job that:

  • uses the tFixedFlowInput component to generate the address data to be analyzed,

  • uses the tMelissaDataAddress component to analyze the input schema and validate, correct and standardize the US addresses generated by the tFixedFlowInput component,

  • uses a tLogRow component to output the correct formatted addresses on the console.

Prerequisites: In order to successfully execute a Job with the tMelissaDataAddress component, you must first add the mdAddr.so library in your system environment variables.

For example:

  • on Linux: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path to folder containing libmdAddr.so>.

  • on Windows: PATH=%PATH%;<path to folder containing mdAddr.dll>.

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tMelissaDataAddress and tLogRow.

  2. Connect the three components together using the Main links.

Configuring the input component

  1. Double-click tFixedFlowInput to open its Basic settings view in the Component tab.

  2. Create the schema through the Edit Schema button.

    Click the plus button to add the following columns to your input schema: company, address1, address2, city, postal and state. These columns are mandatory for the tMelissaDataAddress component.

  3. Click OK.

  4. In the Number of rows field, set the number of rows as 1.

  5. In the Mode area, select the Use Inline Content (delimited file) option, and set the row and field separators in the corresponding fields.

  6. In the Content table, enter the address data you want to analyze, for example:

    Talend Inc.|5150 El Camino Real|Suite C-31 |Los Altos  |94022|
    Talend Inc.|6 Executive Circle|Suite 200|Irvine  California  |92614|
    Talend Inc.|220 White Plains Road|Suite 390|Tarrytown  New York  |10591|
    Talend Inc.|8 New England Executive Park|Suite 170|Burlington  Massachusetts  |01803|

Configuring the tMelissaDataAddress component

  1. Double-click tMelissaDataAddress to display the Basic settings view and define the component properties.

  2. Click Sync columns to retrieve the schema from the preceding component.

  3. Click the Edit schema button to view the input and output schema and do any modifications in the output schema, if necessary.

    In the output schema of this component there are many output standard columns that are read-only. These output columns return for example the standard company and city names, up to two street address lines, two-letter abbreviation for the state and country names, the postal zip code and the results codes.

  4. Click OK to close the dialog box.

  5. In each of the address detail fields, select from the list the column that holds the corresponding address detail; that is the company name, the first and second addresses, the city and state names and finally the postal code.

  6. In the Specify your MelissaData license field, set your license key provided by MelissaData when you order the Data Quality Suite or the Address Object API.

  7. In the Specify your MelissaData DataFile folder field, set the path to the MelissaData data folder provided by MelissaData and installed locally.

Setting a JVM argument and finalizing the Job

  1. Double-click the tLogRow component to display the Basic settings view and define the component properties.

  2. Click the Run tab and then in the open view click Advanced Settings.

  3. Select the Use specific JVM arguments check box and then click New.

  4. In the pop-up window, set the following JVM argument: Djava.library.path=<path/to/mdAddrJavaWrapper.dll/folder/>.

    In this argument, you must indicate the folder where the MelissaData AddressObject library, called libmdAddrJavaWrapper.so on Linux or mdAddrJavaWrapper.dll on Windows, is installed.

    Without the correct JVM argument setting, the following error is to be expected: java.lang.Error: java.lang.UnsatisfiedLinkError.

    Make sure you have added the mdAddr.so library in your system environment variables.

    For example:

    • on Linux: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path to folder containing libmdAddr.so>.

    • on Windows: PATH=%PATH%;<path to folder containing mdAddr.dll>.

  5. Save your Job and press F6 to execute it.

    The tMelissaDataAddress reads the input address rows, corrects and formats the addresses and gives the result in a kind of "standardized" address output rows.

    In addition to verifying and standardizing an address, tMelissaDataAddress will also match street names against a zip code, match geographic data to zip code and city information and finally parse street addresses and return all these results via different output columns. The above capture shows only some of the output columns written by the tMelissaDataAddress component.

    These output columns return for example the standard company and city names, up to two street address lines, two-letter abbreviation for the state and country names and the postal zip code.

    They also return some result code. These codes are written in comma-delimited lists. Each code consists of two letters followed by two numbers. These codes indicate different statuses and errors. For example, the AC02 code means that the state name is corrected based on the combination of city name and zip code, and the AS01 code means that the street address is valid and deliverable.

    For a complete list of the meaning of the result codes and for further information about all the output columns, see the Address Object Reference Guide you can download from the Support Center of MelissaData athttp://www.melissadata.com/.