Reorganizing records of a study on customer behaviour - Cloud

Talend Cloud Pipeline Designer Processors Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Pipeline Designer
Content
Design and Development > Designing Pipelines
Last publication date
2024-02-26

A pipeline with a test source, a Field selector processor, and an HDFS destination.

Before you begin

  • You have previously added the dataset holding your source data.

    Download and extract the file: field_selector-customers.zip. It contains a dataset of a study on customer behavior (type of customers using X or Y device, etc.).

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on HDFS.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Restructure Customer Schema
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a study of customers entered manually as a test dataset.

    Example

    Preview of a data sample with user device records.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Field selector processor to the pipeline. The configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    restructure fields
  7. In the Selectors area of the Advanced mode:
    1. Select .id in the Input list and enter identifier in the Output list , as you want to select and rename the id field while keeping it at the same location.
    2. Click the + sign to add a new element and select .location[0].country in the Input list and country in the Output list, as you want to select the country field of the first location and move it to the top level of the schema.
    3. Click the + sign to add a new element and select .devices in the Input list and enter devices_used in the Output list, as you want to select the devices field while keeping it at the same location.
    4. Click the + sign to add a new element and type .devices[*]{.name == "other"}.ip in the Input list and enter other_devices in the Output list, as you want to select all devices fields with a subfield name that equals other.

      You can use the avpath syntax in this area.

  8. Click Save to save your configuration.

    Look at the preview of the processor to compare your data before and after the restructuring operation.

    Preview of the Field selector processor after restructuring records.
  9. Click ADD DESTINATION and select the dataset that will hold your reorganized data.
    Rename it if needed.
  10. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  11. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the data is reorganized according to the conditions you have stated and the output is sent to the target system you have indicated.