Skip to main content Skip to complementary content

Processing a list of user devices using queries

Availability-noteBeta

A pipeline with a source, a Data Shaping Language processor, a Filter processor, and a destination.

Before you begin

  • You have previously created a connection to the system storing your source data.

    Here, a Test connection.

  • You have previously added the dataset holding your source data.

    Download and extract the file: query_language-devices.zip. It contains a .json hierarchical file with a survey about users devices including the type of devices, their purchase date, IP addresses, etc.

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on an S3 bucket.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Query and process a list of user devices
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a survey about user devices with hierarchical data.

    Example

    Preview of a data sample about user devices.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Data Shaping Language processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    query recent devices
  7. In the Data Shaping Language area, type in:
    FROM devices AS dv
    WHERE toDate(dv.purchase_date) > toDate("2015-01-01")
    SELECT {
    device_type = name,
    purchase_date = dv.purchase_date,
    ip_address = ip }
    This code allows you to:
    • define dv as the alias for the devices records

    • filter on devices purchased at a date later than January 1st, 2015

    • rename and flatten some records: name becomes device_type, ip becomes ip_address

    For more information on the query language syntax, see the Data Shaping Language Reference Guide.

  8. Click Save to save your configuration.

    The preview allows you to visualize the new structure: now that the structure is flattened, more records are outputted and only the devices bought after January 1st, 2015 are displayed.

    Preview of the Data Shaping Language processor after processing device records with a query.
  9. Click Plus and add a Filter processor to the pipeline. The Configuration panel opens.
  10. Give a meaningful name to the processor.

    Example

    keep records about phones
  11. In the Filter area:
    1. Select .device_type in the Input list, as you want to filter customers based on this value.
    2. Select None in the Optionally select a function to apply list, as you do not want to apply a function while filtering records.
    3. Select == in the Operator list and type in phone in the Value list as you want to filter on users with phones.
    4. Click Save to save your configuration. The preview allows you to visualize the records that match the filtering criteria (users with phones).
      Preview of the Filter processor after applying a filter to keep records including phone information.
  12. Click ADD DESTINATION on the pipeline to open the panel allowing to select the dataset that will hold your processed data.
    Rename it if needed.
  13. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  14. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the data is processed according to the conditions you have stated using the query language and the output is sent to the target system you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!