Before you begin
-
You have previously created a connection to the system
storing your source data.
Here, a database connection.
-
You have previously added the dataset holding your source
data.
Download and extract the file: filter-python-customers.zip. It contains lead
data including ID, name, revenue, etc.
-
You also have created the connection and the related dataset
that will hold the processed data.
Here, a file stored on Amazon S3 and a file stored on HDFS.
Procedure
-
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Replicate and Process Leads
-
Click ADD SOURCE to open the panel allowing you to select your source data, here a list of leads.
-
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
-
Click and add a Replicate processor to the pipeline. The
flow is duplicated and the configuration panel opens.
-
Give a meaningful name to the processor.
Example
replicate leads
-
Click the top ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data in the cloud (Amazon S3).
-
Give a meaningful name to the Destination.
Example
store in cloud
-
Click next to the bottom ADD DESTINATION item on the
pipeline and add a Filter processor.
-
Give a meaningful name to the processor.
Example
filter on lead revenues
-
In the Filters area:
-
Select .Revenue in the Input
list, as you want to filter leads based on this value.
-
Select None in the Optionally select a
function to apply list, as you do not want to apply a function
while filtering records.
-
Select >= in the Operator
list and type in 70000 in the
Value list as you want to filter on leads with a
revenue superior to 70000 dollars.
-
Click Save to
save your configuration.
-
(Optional) Look at the Filter processor preview to see your data after the filtering operation.
Example
-
Click the bottom ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data on premises (HDFS) and give it a meaningful name.
Example
store on premises
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the records are duplicated and filtered, and the output flows are sent to the target systems you have indicated.