Before you begin
-
You have previously created a connection to the system
storing your source data.
Here, a connection to a database.
-
You have previously added the dataset holding your source
data.
Download and extract the file: filter-python-customers.zip. It contains a list
of customers with a registration date field that you can find attached to this
document.
-
You also have created the connection and the related dataset
that will hold the processed data.
Here the files are stored on HDFS.
Procedure
-
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Filter on Registration and Revenue
-
Click ADD SOURCE to open the panel allowing you to select your source data, here a list of customers stored in a database.
Example
-
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
-
Click and add a Filter processor to the pipeline. The
Configuration panel opens.
-
Give a meaningful name to the processor.
Example
customers registered in 2000
-
In the Filters area:
-
Select .RegistrationDate in the
Input list, as you want to filter customers based on
this value.
-
Select None in the Optionally select a
function to apply list, as you do not want to apply a function
while filtering records.
-
Select Contains in the Operator
list and type in 2000 in the
Value list as you want to filter on customers whose
registration date contains the year 2000.
You can use the avpath syntax in this area.
-
Click Save to
save your configuration.
-
Click and add another Filter processor to the pipeline. The
Configuration panel opens.
-
Give a meaningful name to the processor.
Example
customers with revenue > 90000
-
In the Filters area:
-
Select .Revenue in the Input
list, as you want to filter customers based on this value.
-
Select None in the Optionally select a
function to apply list, as you do not want to apply a function
while filtering records.
-
Select > in the Operator
list and type in 90000 in the
Value list as you want to filter on customers with a
revenue superior to 90000.
-
Click Save to
save your configuration.
-
Click the button next to the first Filter processor to add and select the dataset that will hold the data that does not match the filter criteria.
-
Give a meaningful name to the Destination.
Example
other registration date
-
Click the ADD DESTINATION item next to the second Filter processor and select the dataset that will hold the data that does not match the filter criteria.
Rename it if needed.
-
Click the button next to the second Filter processor and select
the dataset that will hold your rejected data.
-
Give a meaningful name to the Destination.
Example
other customers
-
(Optional) Look at the last Filter processor preview to see the data after the filtering operation.
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the data is filtered according to the conditions you have stated and the output is sent to the target system you have indicated.