Skip to main content Skip to complementary content
Close announcements banner

Aggregating customer information to calculate purchases

A pipeline with a test source, an Aggregate processor, and an HDFS destination.

Before you begin

  • You have previously created a connection to the system storing your source data.

  • You have previously added the dataset holding your source data.

    Download and extract the aggregate-customers.zip file. It contains a hierarchical list of customers data including ID, product information such as book title and price, etc.

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on HDFS.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Aggregate Customer Data to Calculate Purchases
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a list of hierarchical customer data about book purchases.

    Example

    Preview of a data sample about book purchases.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add an Aggregate processor to the pipeline. The configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    calculate customer purchases
  7. In the Group by area, select the field you want to use for your aggregation set, here .customerId.
  8. In the Operations area:
    1. Select .customerId in the Field path list and Count in the Operation list.
    2. Name the generated field (Output field name), nbOfPurchases for example.
    3. Click the + sign to add a new element, select .product.price in the Field path list and Sum in the Operation list.
    4. Name the generated field, totalPrice for example.
    5. Click the + sign to add a new element, select .product.name in the Field path list and List in the Operation list.
    6. Name the generated field, books for example.
  9. Click Save to save your configuration.

    You can preview the calculated data after the aggregating operation: the books and amount of money spent per customer.

    Preview of the processor after applying an aggregate operation.
  10. Click ADD DESTINATION on the pipeline to open the panel allowing to select the dataset that will hold your output data (HDFS).

    Rename it if needed.

  11. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  12. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the book purchases are aggregated per customer, and the output flow is sent to the target systems you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!