Selecting specific records using avpath - Cloud

Talend Cloud Pipeline Designer User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Pipeline Designer
Content
Administration and Monitoring > Monitoring executions
Administration and Monitoring > Monitoring logs
Data Governance > Filtering data
Data Quality and Preparation > Filtering data
Data Quality and Preparation > Managing datasets
Deployment > Deploying > Executing Pipelines
Design and Development > Designing Pipelines
Last publication date
2024-02-09

In this scenario the avpath syntax is used to filter the reviews of restaurants based on user age, votes, and noise level preferences.

A pipeline named 'Filter restaurant reviews' shows an Amazon S3 source, two Filter processors and an Amazon S3 destination.

Before you begin

  • You have previously created a connection to the system storing your source data, here a connection to an S3 bucket. For more information, see Creating a connection.

  • You have previously added the dataset holding your source data.

    Download and extract the file: restaurant_reviews.zip. It contains restaurant reviews with nested records about the restaurant and users. For more information, see Creating a dataset.

  • You also have created the connection and the related dataset that will hold the processed data.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Filter restaurant reviews
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here restaurant reviews.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Filter processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor; with reviews by at least 20 helpful old people for example.
  7. In the Filter area:
    1. Type in .reviews{.user.age >= 60 && .user.user_votes.helpful > 20} in the Input area, as you want only the reviews entered by 60+ year old users with at least 20 helpful votes.
    2. Select Count in the Optionally select a function to apply list, >= in the Operator list and type in 20 in the Value list as you want at least 20 of these user reviews.
  8. Click Save to save your configuration.
  9. Click Plus again and add another Filter processor to the pipeline. The Configuration panel opens.
  10. Give a meaningful name to the processor; with quiet noise level for example.
  11. In the Filter area:
    1. Select .business.attributes.noise_level in the Input list, as you want to filter the restaurants based on their noise level.
    2. Select None in the Optionally select a function to apply list, == in the Operator list and type in quiet in the Value list as you want to filter on restaurants with a quiet noise level.
  12. Click Save to save your configuration.
  13. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your filtered data.
  14. Give a meaningful name to the destination; perfect restaurants for old hipsters for example.
  15. (Option) Look at the last Filter processor to preview and compare your data after the filtering operation.
  16. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  17. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the data is filtered according to the conditions you have stated using avpath and the output is sent to the target system you have indicated.