Scenario: Parsing addresses against reference data in the Cloud - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a three-component Job that:

  • uses the tFixedFlowInput component to generate the address data to be analyzed,

  • uses the tAddressRowCloud component to parse, standardize and format the addresses in the Cloud through the Address Validation API,

  • uses a tFileOutputExcel component to output the correct formatted addresses in an .xls file.

You must have internet connection to be able to use tAddressRowCloud.

Setting up the Job

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tAddressRowCloud and tFileOutputExcel.

  2. Connect the three components together using the Main links.

Configuring the input component

  1. Double-click tFixedFlowInput to open its Basic settings view in the Component tab.

  2. Create the schema through the Edit Schema button.

    In the open dialog box, click the [+] button and add the columns that will hold the information in the input address, in this example: Address and Country.

  3. Click OK.

    An address and a country columns are created in the Inline Table.

  4. In the Number of rows field, set the number of rows as 1.

  5. In the Mode area, select the Use Inline Table option.

  6. In the Content table, enter the address data you want to analyze, for example:

    "1 Chemin de l'Abbaye, Paris"
    "1 Rue de l'Abbaye, Paris"
    "1 Place de l'Abbaye basset, Paris"
    

    Set the country for the three address lines to FRA.

Parsing addresses against Loqate

Setting the schema and defining address mapping

  1. Double-click tAddressRowCloud to display the Basic settings view and define the component properties.

  2. If required, click Sync columns to retrieve the schema defined in the input component.

  3. Click the Edit schema button to open the schema dialog box.

    tAddressRowCloud proposes several predefined read-only address columns as shown in the below capture.

    The STATUS column returns the status of processing input addresses. For further information about process status, see Process status in tLoqateAddressRow.

    The AddressVerificationCode column returns the verification code for the processed address. For further information about what values this code is made up of and the implications of each segment, see Address verification codes in tLoqateAddressRow.

  4. Move any of the input columns to the output schema according to your needs, click OK and accept to propagate the changes.

    You can also add columns directly in the output schema to retrieve additional address information from the Loqate repository.

  5. Select from the Address Provider list the provider of the reference data against which you want to validate and format input addresses, Loqate in this example.

  6. Select the Use security mode to connect check box to connect to the provider repository in a secure mode.

    This may have a slight impact on performance.

  7. In the License/API key field, enter the license key provided by Loqate.

  8. From the Processing Mode list, select:

    Option

    To...

    Verify and Geocode (selected by default)

    standardize and correct addresses and enrich them with latitude and longitude.

    Note

    Combining address verification and geocoding will cost extra credits. For further information, see Cloud Price Card.

    Verify only

    standardize and correct addresses without enriching them with latitude and longitude.

  9. In the Mapping table:

    • Use the [+] button to add lines in the table.

    • Click in the Address Field column and select from the list predefined in the component the fields that hold the input address, Address and Country in this example.

      The component will map the values of these fields to the input columns you set in this table.

      tAddressRowCloud provides a list of individual fields because some countries have more complex addressing structures than others.

    • Click in the Input Column column and select from the list of the input schema the columns that hold the input address, address and country in this example.

Defining additional address fields

  1. If required, select the Use Additional Output check box to retrieve additional address information from the provider repository.

  2. Click the Edit schema button to open the schema dialog box and add in the output schema the columns which will hold the extra address information. Add all_info and Geo_info columns for this example.

  3. In the Output Mapping table:

    • Use the [+] button to add lines in the table.

    • Click in the Address Field column and select from the predefined list the additional address fields you want to add to the output schema.

    • Click in the Output Column column and select from tAddressRowCloud output schema the columns that will hold the additional address information.

    The component maps the values of the address fields in the Loqate repository to the output columns you set in the table.

  4. Set the parameters in the Advanced settings view according to your needs.

    The default parameters are not changed for this example.

Configuring the output component and executing the Job

  1. Double-click the tFileOutputExcel component to display the Basic settings view and define the component properties.

  2. Set the destination file name as well as the sheet name and then select the Define all columns auto size check box.

  3. Save your Job and press F6 to execute it.

    The tAddressRowCloud component uploads data to the cloud, retrieves the corrected data and writes the result in the output file.

  4. Right-click the output component and select Data Viewer to display the formatted address data.

    tAddressRowCloud matches input address data against the Loqate repository.

    The all_info and Geo_info columns retrieve additional address information from the Raw_Response and GeoAccuracy columns respectively in the Loqate repository. The Raw_Response column provides you with all address information from the provider repository without any formatting. if you want this information to be more readable, you must parse it using json or xml.

    The STATUS output column returns the OK status for all address rows. This means that the verification process of all address rows could be completed successfully by the component. For further information about process status, see Process status in tLoqateAddressRow.

    The VerificationLevel output column provides you with a verification status of the processed addresses. For further information, see Verification status.

    The AddressVerificationCode output column returns a verification code for each of the processed address rows. For example, the first verification code V44-I45-P3-100 means:

    • Verification status = V (verified): a complete match was made between the input address and a single record from the available reference data.

    • Post-processed verification match level = 4 (premises): the level to which the input data matches the available reference data once all changes and additions performed during the verification process have been taken into account.

    • Pre-processed verification match level = 4 (premises): the level to which the input data matches the available reference data prior to any changes or additions performed during the verification process.

    • Parsing status = I (identified and parsed): all components of the input data have been able to be identified and placed into output fields.

    • Lexicon identification match level = 4 (premises): using pattern matching, a numeric value or word has been identified as a premises number or name.

    • Context identification match level = 5 (delivery point, PostBox or SubBuilding): a numeric value or word has been identified as a post box number or sub building name.

    • Postcode Status = P3 (added): the primary postal code for the country has been added.

    • Match score = 100 (complete similarity): the input data and closest reference data match completely.

    For further information about what values this code is made up of and the implications of each segment, see Address verification codes in tLoqateAddressRow.

Parsing addresses against MelissaData

You can run the Scenario: Parsing addresses against reference data in the Cloud Job against Melissadata repository by doing the followings:

  1. In the tFixedFlowInput Basic settings, create the schema through the Edit Schema button.

    In the open dialog box, click the [+] button and add one column that will hold the information in the input address, in this example: address.

  2. Click OK.

    An address column is created in the Inline Table.

  3. In the Inline Table table, enter the address data you want to analyze, for example:

    "1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK USA"
    "B69 2lt 9kings   United Kingdom ave"
    "1729号 黄兴路 China, 200433"
    "15 Rue Nelaton   Paris PARIS 92800 France"
    "1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK"
    
  4. In the basic settings of tAddressRowCloud, select MelissaData from the Address Provider list.

  5. In the License key/KPI key field, enter the license key provided by MelissaData.

  6. In the Mapping table, click the [+] button to add a line and then select Address.

    The component will map the values of this field to the input column you set in this table.

  7. If required, select the Use Additional Output check box and use the Output Mapping table to retrieve additional address information from the provider repository.

    For further information, see Defining additional address fields.

  8. Leave the parameters in the Advanced settings view unchanged.

  9. Save your Job and press F6 to execute it.

    The tAddressRowCloud component uploads data to the cloud, retrieves the corrected data and writes the result in the output file.

  10. Right-click the output component and select Data Viewer to display the formatted address data.

    tAddressRowCloud matches input address data against the MelissaData data repository and writes formatted addresses in the output file.

    The AddressVerificationCode output column returns a verification code for each of the processed address rows. These codes are written in comma-delimited lists. Each code consists of two letters followed by two numbers. These codes indicate different statuses and errors. For example, the AC02 code means that the state name is corrected based on the combination of city name and zip code.

    For a complete list of the meaning of the result codes and for further information about all the output columns, see the Address Object Reference Guide you can download from the Support Center of MelissaData athttp://www.melissadata.com/.

    The VerificationLevel output column provides you with a verification status of the processed addresses. For further information, see Address verification levels in tAddressRowCloud.

Parsing addresses against Google

You can run the Scenario: Parsing addresses against reference data in the Cloud Job against Google Places API by doing the followings:

  1. In the tFixedFlowInput Basic settings, create the schema through the Edit Schema button.

    In the open dialog box, click the [+] button and add one column that will hold the information in the input address, in this example: address.

  2. Click OK.

    An address column is created in the Inline Table.

  3. In the Inline Table table, enter the address data you want to analyze, for example:

    "1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK USA"
    "B69 2lt 9kings   United Kingdom ave"
    "1729号 黄兴路 China, 200433"
    "15 Rue Nelaton   Paris PARIS 92800 France"
    "1211 AVENUE OF AMERICAS FL 8 10036 NEW YORK"
    "1 Rue de l'Abbaye, Paris"
    "1 Chemin de l'Abbaye, Paris"
    "1 Place de l'Abbaye basset, Paris"
    "8000 Cummings Hall,Hanover,New Hampshire,03755,"
    
  4. In the basic settings of tAddressRowCloud, select Google from the Address Provider list.

  5. In the License/API key field, enter the API key you generate from the Google Developer Console at https://developers.google.com/console/help/new/.

  6. In the Mapping table, click the [+] button to add a line and then select Address.

    The component will map the values of this field to the input column you set in this table.

  7. If required, select the Use Additional Output check box and use the Output Mapping table to retrieve additional address information from the provider repository.

    For further information, see Defining additional address fields.

  8. In the Advanced settings view, set Output Script to FRENCH and leave the other parameters unchanged.

  9. Save your Job and press F6 to execute it.

    The tAddressRowCloud component uploads data to the cloud, retrieves the corrected data and writes the result in the output file.

  10. Right-click the output component and select Data Viewer to display the formatted address data.

    tAddressRowCloud matches input address data against Google Places API and writes formatted addresses in the output file.

    The VerificationLevel output column provides you with a verification status of the processed addresses. For further information, see Address verification levels in tAddressRowCloud.

Parsing addresses against QAS

You can run the Scenario: Parsing addresses against reference data in the Cloud Job using the QAS Pro OnDemand service and verifiy the accuracy and completeness of addresses.

  1. In the tFixedFlowInput Basic settings, create the schema through the Edit Schema button.

    In the open dialog box, click the [+] button and add one column that will hold the information in the input address, in this example: address.

  2. Click OK.

    An address column is created in the Inline Table.

  3. In the Inline Table table, enter the address data you want to analyze, for example:

    "1 nonsense st, nowhereville, SC,11111"
    "14 elmwood,rome,ga,30161"
    "300 n quincy pl, charlestown,MA,02129"
    "reba st,pelion,SC,29123"
    "1445 montebello st,montebello,90640"
    "43400 gadsden ave,lancaster,ca,93534"
    "po box 123,san francisco,ca,94104"
    "43400 gadsden ave apt 3,lancaster,ca,93534"
    
  4. In the basic settings of tAddressRowCloud, select QAS from the Address Provider list.

  5. From the Country list, select the country corresponding to your input addresses, United States in this example.

  6. In the QAS OnDemand username and password fields, enter respectively your username and password you can find in the license provided by QAS.

  7. In the Mapping table, click the [+] button to add a line and then select Address.

    The component will map the values of this field to the input column you set in this table.

  8. Leave the parameters in the Advanced settings view unchanged.

  9. Save your Job and press F6 to execute it.

    The tAddressRowCloud component uploads data to the cloud, validates and retrieves the corrected data and writes the result in the output file.

  10. Right-click the output component and select Data Viewer to display the formatted address data.

    tAddressRowCloud validates input address data against QAS Pro OnDemand and writes formatted addresses in the output file.

    The VerificationLevel output column provides you with a verification status of the processed addresses. For further information, see Address verification levels in tAddressRowCloud.