In this scenario the avpath syntax is used to filter the reviews of restaurants
based on user age, votes, and noise level preferences.
Before you begin
-
You have previously created a connection to the system storing your source
data, here a connection to an S3 bucket. For more information, see Creating a connection.
-
You have previously added the dataset holding your source
data.
Download and extract the file: restaurant_reviews.zip.
It contains restaurant reviews with nested records about the restaurant and users.
For more information, see Creating a dataset.
-
You also have created the connection and the related dataset
that will hold the processed data.
Procedure
-
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Filter restaurant
reviews
-
Click ADD SOURCE to open the panel allowing you to
select your source data, here restaurant reviews.
-
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
-
Click and add a Filter processor to the pipeline. The
Configuration panel opens.
-
Give a meaningful name to the processor; with reviews by at least 20
helpful old people for example.
-
In the Filter area:
-
Type in .reviews{.user.age
>= 60 && .user.user_votes.helpful > 20} in
the Input area, as you want only
the reviews entered by 60+ year old users with at least 20 helpful
votes.
-
Select Count in the Optionally select a
function to apply list, >= in the
Operator list and type in 20
in the Value list as you want at least 20 of these user
reviews.
-
Click Save to
save your configuration.
-
Click again and add another Filter processor to the
pipeline. The Configuration panel opens.
-
Give a meaningful name to the processor; with quiet noise
level for example.
-
In the Filter area:
-
Select .business.attributes.noise_level in the Input list, as you want to filter the
restaurants based on their noise level.
-
Select None in the Optionally select a
function to apply list, == in the
Operator list and type in
quiet in the Value list as you
want to filter on restaurants with a quiet noise level.
-
Click Save to
save your configuration.
-
Click the ADD DESTINATION item on the pipeline to open the
panel allowing to select the dataset that will hold your filtered data.
-
Give a meaningful name to the destination; perfect restaurants for old
hipsters for example.
-
(Option) Look at the last Filter processor to preview and compare your data after the
filtering operation.
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the data is filtered according to the conditions you
have stated using avpath and the output is sent to the target system you have
indicated.