Selecting specific records using avpath - Cloud

Talend Cloud Pipeline Designer User Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Pipeline Designer
task
Administration and Monitoring > Monitoring executions
Administration and Monitoring > Monitoring logs
Data Governance > Filtering data
Data Quality and Preparation > Filtering data
Data Quality and Preparation > Managing datasets
Deployment > Deploying > Executing Pipelines
Design and Development > Designing Pipelines

In this scenario the avpath syntax is used to filter the reviews of restaurants based on user age, votes and noise level preferences.

Before you begin

  • You have previously created a connection to the system storing your source data, here a connection to an S3 bucket. For more information, see Creating a connection.

  • You have previously added the dataset holding your source data.

    Here, restaurant reviews with nested records about the restaurant and user information, you can download the restaurant_reviews.avro file from the Downloads tab in the left panel of this page. For more information, see Creating a dataset.

  • You also have created the connection and the related dataset that will hold the processed data.

Procedure

  1. Click ADD PIPELINE on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Filter restaurant reviews
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here restaurant reviews.
  4. Select your dataset and click SELECT in order to add it to the pipeline.
    Rename it if needed.
  5. Click and add a Filter processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor; with reviews by at least 20 helpful seniors for example.
  7. In the Filter area:
    1. Type in .reviews{.user.age >= 60 && .user.user_votes.helpful > 20} in the Input area, as you want only the reviews entered by 60+ year old users with at least 20 helpful votes.
    2. Select COUNT in the Optionally select a function to apply list, >= in the Operator list and type in 20 in the Value list as you want at least 20 of these user reviews.
  8. Click SAVE to save your configuration.
  9. Click again and add another Filter processor to the pipeline. The Configuration panel opens.
  10. Give a meaningful name to the processor; with quiet noise level for example.
  11. In the Filter area:
    1. Select .business.attributes.noise_level in the Input list, as you want to filter the restaurants based on their noise level.
    2. Select NONE in the Optionally select a function to apply list, == in the Operator list and type in quiet in the Value list as you want to filter on restaurants with a quiet noise level.
  12. Click SAVE to save your configuration.
  13. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your filtered data.
  14. Give a meaningful name to the Destination; perfect restaurants for senior hipsters for example.
  15. (Option) Look at the last Filter processor to preview and compare your data after the filtering operation.
  16. On the top toolbar of Talend Cloud Pipeline Designer, select your run profile in the list (for more information, see Run profiles).
  17. Click the run icon to run your pipeline.

Results

Your pipeline is being executed, the data is filtered according to the conditions you have stated using avpath and the output is sent to the target system you have indicated.