Before you begin
You have previously created a Connection to the system storing your source data.
You have previously added the Dataset holding your source data.
Here, a hierarchical list of customers data including ID, product information such as book title and price, etc that you can find attached to this document (download the aggregate-customers.json file from the Downloads tab in the left panel of this page).
You also have created the Connection and the related Dataset that will hold the processed data.
Here, a file stored on HDFS.
- Click ADD PIPELINE on the PIPELINES page. Your new Pipeline opens.
Give the Pipeline a meaningful name.
Aggregate Customer Data to Calculate Purchases
Click ADD SOURCE to open the panel allowing you to
select your source data, here a list of hierarchical customer data about book
Select your Dataset and click SELECT DATASET in order to
add it to the Pipeline.
Rename it if needed.
- Click and add an Aggregate processor to the Pipeline. The configuration panel opens.
- Give a meaningful name to the processor.
calculate customer purchases
- In the GROUP BY area, select the field you want to use for your aggregation set, here .customerId.
In the OPERATIONS area:
- Select .customerId in the Field list and Count in the Operation list.
- Name the generated field (Output field), nbOfPurchases for example.
- Add a NEW ELEMENT, select .product.price in the Field list and Sum in the Operation list.
- Name the generated field, totalPrice for example.
- Add a NEW ELEMENT, select .product.name in the Field list and List in the Operation list.
- Name the generated field, books for example.
- Click SAVE to save your configuration.
Click the ADD DESTINATION item on the Pipeline to open the
panel allowing to select the Dataset that will hold your output data
Rename it if needed.
(Optional) Click the preview icon
after the Aggregate processor to preview the calculated
data after the aggregating operation: the books and amount of money spent per
- On the top toolbar of Talend Cloud Pipeline Designer, select your Run Profile in the list (for more information, see Execution profiles).
- Click the run icon to run your Pipeline.
Your Pipeline is being executed, the book purchases are aggregated per customer, and the output flow is sent to the target systems you have indicated.