Before you begin
-
You have previously created a connection to the system
storing your source data.
-
You have previously added the dataset holding your source
data.
Download and extract the file: aggregate-customers.zip.
It contains a hierarchical list of customers data including ID, product
information such as book title and price, etc.
-
You also have created the connection and the related dataset
that will hold the processed data.
Here, a file stored on HDFS.
Procedure
-
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Aggregate Average Purchase Price
-
Click ADD SOURCE to open the panel allowing you to select your source data, here a list of hierarchical customer data about book purchases.
Example
-
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
-
Click and add an Aggregate processor to the pipeline. The
configuration panel opens.
-
Give a meaningful name to the processor.
Example
calculate average price
-
In the Group by area, click the recycle bin icon next to the
empty field to remove it as you want the whole dataset to be aggregated into one
single record.
-
In the Operations area:
-
Select .product.price in the Field
path list and Average in the
Operation list as you want to group the average price
of all the books purchased by customers.
-
Name the generated field (Output field name),
avgPrice for example.
-
Click Save to
save your configuration.
You can preview the calculated data after the aggregating operation: the average book price is 13.96 dollars.
-
Click ADD DESTINATION on the pipeline to open the panel
allowing to select the dataset that will hold your output data (HDFS).
Rename it if needed.
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the average book price is aggregated in one single record, and the output flow is sent to the target systems you have indicated.