In this scenario the avpath syntax is used to filter the reviews of
restaurants based on user age, votes and noise level preferences.
Before you begin
-
You have previously created a connection to the system storing your source
data, here a connection to an S3 bucket. For more information, see Creating a connection.
-
You have previously added the dataset holding your source
data.
Here, restaurant reviews with nested records about the
restaurant and user information, you can download the restaurant_reviews.avro file from the
Downloads tab in the left panel
of this page. For more information, see Creating a dataset.
-
You also have created the connection and the related dataset
that will hold the processed data.
Procedure
-
Click ADD
PIPELINE on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Filter restaurant
reviews
-
Click ADD SOURCE to open the panel allowing you to
select your source data, here restaurant reviews.
-
Select your dataset
and click SELECT in order to add it to the pipeline.
Rename it if
needed.
-
Click
and add a Filter processor to the pipeline. The
Configuration panel opens.
-
Give a meaningful name to the processor; with reviews by at least 20
helpful seniors for example.
-
In the Filter area:
-
Type in .reviews{.user.age
>= 60 && .user.user_votes.helpful > 20} in
the Input area, as you want only
the reviews entered by 60+ year old users with at least 20 helpful
votes.
-
Select COUNT in
the Optionally select a function to
apply list, >=
in the Operator list and type in
20 in the Value list as you want at least 20 of
these user reviews.
-
Click SAVE to save your configuration.
-
Click
again and add another
Filter processor to the pipeline. The
Configuration panel opens.
-
Give a meaningful name to the processor; with quiet noise
level for example.
-
In the Filter area:
-
Select .business.attributes.noise_level in the Input list, as you want to filter the
restaurants based on their noise level.
-
Select NONE in
the Optionally select a function to
apply list, == in
the Operator list and type in
quiet in the Value list as you want to filter on
restaurants with a quiet noise level.
-
Click SAVE to save your configuration.
-
Click the ADD DESTINATION item on the pipeline to open the
panel allowing to select the dataset that will hold your filtered data.
-
Give a meaningful name to the Destination; perfect restaurants for senior
hipsters for example.
-
(Option) Look at the last Filter processor to preview and compare your data after the
filtering operation.
-
On the top toolbar
of Talend Cloud Pipeline Designer,
select your run profile in the list (for more information, see Run profiles).
-
Click the
run icon to run your
pipeline.
Results
Your pipeline is being executed, the data is filtered according to the conditions you
have stated using avpath and the output is sent to the target system you have
indicated.