Combining user country codes with actual country names - Cloud

Talend Cloud Pipeline Designer Processors Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development > Designing Pipelines
EnrichPlatform
Talend Pipeline Designer

Before you begin

  • You have previously created a connection to the system storing your source data.

    Here, an HDFS connection.

  • You have previously added the dataset holding your source data.

    Here the Left dataset holds user data with country codes and indexes, and the Right dataset holds the data to be combined with the source data: country names and indexes (download the join-countries.json and join-users.json files from the Downloads tab in the left panel of this page).

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a database table.

Procedure

  1. Click ADD PIPELINE on the PIPELINES page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Join Country Data
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a list of customers with country codes stored in HDFS.

    Example

  4. Select your dataset and click SELECT DATASET in order to add it to the pipeline.
    Rename it if needed.
  5. Click and add a Join processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    combine country data
  7. In the CONFIGURATION area:
    1. Select the dataset to be combined with the source dataset (here, a dataset called Countries) in the Join dataset list.
    2. Select Left outer join in the Join type list, as you want matching records and additional records from the left dataset to be listed in the result set.
  8. In the CONDITIONS area:
    1. Select or enter the path to the existing record to be compared in the left dataset. (here, .countryCode) in the Left key list.
    2. Select or enter the path to the existing record to be compared in the right dataset. (here, .index) in the Right key list.

      You can use the avpath syntax in this area.

  9. Click SAVE to save your configuration.
  10. Click the ADD DESTINATION item next to the Join processor and select the dataset that will hold your joined data.
    Rename it if needed.
  11. (Optional) Click the top preview icon after the last Join processor to preview your data after the join operation.
    Note:
    • After saving the Join processor configuration, the JOIN DATASET tab will appear in the Data preview area to let you preview the result of the join operation.
    • Only the 50 first records of your datasets are loaded in the preview. As a result you may not see matching records in the preview but they will be taken into account at run time.
  12. On the top toolbar of Talend Cloud Pipeline Designer, select your run profile in the list (for more information, see Run profiles).
  13. Click the run icon to run your pipeline.

Results

Your pipeline is being executed, the user country data is now joined and both the country codes and the country full names are combined in the generated output that is sent to the target system you have indicated.