Reorganizing records of a study on customer behaviour - Cloud

Talend Cloud Pipeline Designer Processors Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Design and Development > Designing Pipelines
EnrichPlatform
Talend Pipeline Designer

Before you begin

  • You have previously added the Dataset holding your source data.

    Here, a Dataset on S3 holding a study on customer behaviour (type of customers using X or Y device, etc.). Download the corresponding field_selector-customers.json file from the Downloads tab in the left panel of this page.

  • You also have created the Connection and the related Dataset that will hold the processed data.

    Here, a file stored on HDFS.

Procedure

  1. Click ADD PIPELINE on the PIPELINES page. Your new Pipeline opens.
  2. Give the Pipeline a meaningful name.
    Restructure Customer Schema
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a study of customers entered manually as a Custom Dataset.
  4. Select your Dataset and click SELECT DATASET in order to add it to the Pipeline.
    Rename it if needed.
  5. Click and add a Field Selector processor to the Pipeline. The configuration panel opens.
  6. Give a meaningful name to the processor.
    restructure fields
  7. In the SELECTORS area:
    1. Enter identifier in the Field name list and .id in the Path list, as you want to select and rename the id field while keeping it at the same location.
    2. Add a NEW ELEMENT and enter country in the Field name list and .location[0].country in the Path list, as you want to select the country field of the first location and move it to the top level of the schema.
    3. Add a NEW ELEMENT and enter devices_used in the Field name list and .devices in the Path list, as you want to select the devices field while keeping it at the same location.
    4. Add a NEW ELEMENT and enter other_devices in the Field name list and type .devices[*]{.name == "other"}.ip in the Path list, as you want to select all devices fields with a subfield name that equals other.

      You can use the avpath syntax in this area.

  8. Click SAVE to save your configuration.
  9. Click ADD DESTINATION and select the Dataset that will hold your reorganized data.
    Rename it if needed.
  10. (Optional) Look at the preview of the Field Selector processor to compare your data before and after the restructuring operation.
  11. On the top toolbar of Talend Cloud Pipeline Designer, select your Run Profile in the list (for more information, see Execution profiles).
  12. Click the run icon to run your Pipeline.

Results

Your Pipeline is being executed, the data is reorganized according to the conditions you have stated and the output is sent to the target system you have indicated.