Before you begin
You have previously created a connection to the system storing your source data.
Here, a database connection.
You have previously added the dataset holding your source data.
Here, a table of customers with first name, last name, registration date and revenue fields (download the filter-python-customers.json file from the Downloads tab in the left panel of this page).
You also have created the connection and the related dataset that will hold the processed data.
Here, a file stored on HDFS.
- Click ADD PIPELINE on the Pipelines page. Your new pipeline opens.
Give the pipeline a meaningful name.
ExampleProcess Customers with Python
Click ADD SOURCE to open the panel allowing you to
select your source data, here a table of customers.
Select your dataset
and click SELECT in order to add it to the pipeline.
Rename it if needed.
- Click and add a Python processor to the pipeline. The Configuration panel opens.
Give a meaningful name to the processor.
Exampleaggregate name - convert to euros - calculate registration date
- In the Map list, select Map.
In the Python code area, type in:
date=input['RegistrationDate'].split("/") year=date output['id'] = input['id'] output['fullname'] = input['Firstname'] + " " + input["Lastname"] output['euro_revenue'] = int(input['Revenue']) * 0.83 output['number_year_registrated'] = 2019 - int(year)This code allows you to:
concatenate the first name and last name fields
convert the revenue to euros
calculate the number of year the customer has been registered
- Click SAVE to save your configuration.
Click the ADD DESTINATION item on the pipeline to open the
panel allowing to select the dataset that will hold your processed data.
Rename it if needed.
(Optional) Look at the preview of the Python processor to compare your data before and
after the operations.
- On the top toolbar of Talend Cloud Pipeline Designer, select your run profile in the list (for more information, see Run profiles).
- Click the run icon to run your pipeline.
Your pipeline is being executed, the data is processed according to the conditions you have stated in the Python code and the output is sent to the target system you have indicated.