Replicating a list of leads and processing the two output flows differently - Cloud

Talend Cloud Pipeline Designer Processors Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development > Designing Pipelines
EnrichPlatform
Talend Pipeline Designer

Before you begin

  • You have previously created a connection to the system storing your source data.

    Here, a database connection.

  • You have previously added the dataset holding your source data.

    Here, hierarchical actors data including ID, name, country, etc. (download the filter-python-customers.json file from the Downloads tab in the left panel of this page).

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on Amazon S3 and a file stored on HDFS.

Procedure

  1. Click ADD PIPELINE on the PIPELINES page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Replicate and Process Leads
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a list of leads.
  4. Select your dataset and click SELECT DATASET in order to add it to the pipeline.
    Rename it if needed.
  5. Click and add a Replicate processor to the pipeline. The flow is duplicated and the configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    replicate leads
  7. Click the top ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data in the cloud (Amazon S3).
  8. Give a meaningful name to the Destination.

    Example

    store in cloud
  9. Click next to the bottom ADD DESTINATION item on the pipeline and add a Filter processor.
  10. Give a meaningful name to the processor.

    Example

    filter on revenues and short last names
  11. In the Filter area:
    1. Select .Revenue in the Field path list, as you want to filter leads based on this value.
    2. Select NONE in the Apply a function first list, as you do not want to apply a function while filtering records.
    3. Select >= in the Operator list and type in 70000 in the Value list as you want to filter on leads with a revenue superior to 70000 dollars.
    4. Click the NEW ELEMENT button to add a filter and select .States in the in the Field path list.
    5. Select NONE in the Apply a function first list, as you do not want to apply a function while filtering records.
    6. Select != in the Operator list and type in XX in the Value list as you want to filter on leads with a revenue superior to 70000 dollars.
  12. Click SAVE to save your configuration.
  13. (Optional) Click the preview icon after the Filter processor to preview your data after the filtering operation.
  14. Click the bottom ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data on premises (HDFS) and give it a meaningful name.

    Example

    store on premises
  15. On the top toolbar of Talend Cloud Pipeline Designer, select your run profile in the list (for more information, see Run profiles).
  16. Click the run icon to run your pipeline.

Results

Your pipeline is being executed, the records are duplicated and filtered, and the output flows are sent to the target systems you have indicated.