Replicating a list of leads and processing the two output flows differently

A pipeline with a source, a Replicate processor, a Filter processor, and two destinations.

Before you begin

You have previously created a connection to the system storing your source data.

Here, a database connection.
You have previously added the dataset holding your source data.

Download and extract the file: filter-python-customers.zip. It contains lead data including ID, name, revenue, etc.
You also have created the connection and the related dataset that will hold the processed data.

Here, a file stored on Amazon S3 and a file stored on HDFS.

Procedure

Click Add pipeline on the Pipelines page. Your new pipeline opens.
Give the pipeline a meaningful name.
Example
Replicate and Process Leads
Click ADD SOURCE to open the panel allowing you to select your source data, here a list of leads.
Select your dataset and click Select in order to add it to the pipeline.
Rename it if needed.
Click and add a Replicate processor to the pipeline. The flow is duplicated and the configuration panel opens.
Give a meaningful name to the processor.
Example
replicate leads
Click the top ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data in the cloud (Amazon S3).
Give a meaningful name to the Destination.
Example
store in cloud
Click next to the bottom ADD DESTINATION item on the pipeline and add a Filter processor.
Give a meaningful name to the processor.
Example
filter on lead revenues
In the Filters area:
1. Select .Revenue in the Input list, as you want to filter leads based on this value.
2. Select None in the Optionally select a function to apply list, as you do not want to apply a function while filtering records.
3. Select >= in the Operator list and type in 70000 in the Value list as you want to filter on leads with a revenue superior to 70000 dollars.
Click Save to save your configuration.
(Optional) Look at the Filter processor preview to see your data after the filtering operation.
Example
Click the bottom ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data on premises (HDFS) and give it a meaningful name.
Example
store on premises
On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the records are duplicated and filtered, and the output flows are sent to the target systems you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here

Replicating a list of leads and processing the two output flows differently

Before you begin

Procedure

Example

Example

Example

Example

Example

Example

Results

Did this page help you?