In this scenario context variables are added to override both datasets
that are used as source and destination at execution time.
Before you begin
-
You have previously created a connection to the system storing your source data,
here an HTTP Client connection.
The Base URL of the connection is:
https://datausa.io/
-
You have previously added the dataset holding your source
data.
Here, United States public data including population statistics.
The HTTP Client dataset properties are:
- HTTP method: GET
- Path: /api/data
- Query parameters: Name: drilldowns Value:
Nation; Name: measures
Value: Population
- Response body format: JSON
- Extract a sub-part of the response: .data
- Returned content: Body
- You also have created the destination connection, here a Google BigQuery connection
and a BigQuery dataset named Nation_statistics. This BigQuery
table will be created at execution time and will contain US statistics per year.
Procedure
-
Click Add
pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
Filter US population stats on year >=2015
-
Click ADD SOURCE to open the panel allowing you to select
your source data, here get US stats.
-
Select your dataset and click
Select in order to add it to the pipeline.
Rename it if needed.
-
Click and
add a Filter processor to the pipeline.
The Configuration panel opens.
-
Give a meaningful name to the processor; filter on year >=
2015 for example.
-
In the Filter area:
-
Select .ID_Year in the Input
area, as you want to filter the records corresponding to the year in which data
has been collected.
-
Select None in the Optionally select a
function to apply list, >= in the
Operator list and type in 2015
in the Value list as you want to filter on statistics
collected after the year 2015.
-
Click Save to
save your configuration.
You can see that the records are filtered and 6 records meet the criteria you have
defined.
-
Click the ADD DESTINATION item on the pipeline to open the
panel allowing to select the BigQuery table that will hold your filtered data.
-
Give a meaningful name to the Destination; Nation stats
table for example and select Create table if not
exists in the Table operation list, as you want
to create the Nation_statistics table and insert data in it at
execution time.
-
(Optional) If you execute your pipeline at this stage, you will see in the logs
that the filtered records were passed according to the filter you defined and you
will see the new Nation_statistics table created in your
Google BigQuery account. This new table contains the 6 filtered records with the
statistics collected in the US.
-
Go back to the Dataset tab of the US data -
stats source to add and assign a variable:
-
In the Query parameters, click the icon next to the Value parameter of the
drilldowns to open the [Add
variable] window.
-
Give a name to your variable, State statistics for
example.
-
Enter the variable value that will overwrite the default resource to be
retrieved, State here.
-
Enter a description if needed and click Add.
-
Now that your variable is created, you are redirected to the
[Assign a variable] window that lists all context
variables. Select yours and click Assign.
Your variable and its value are assigned to the
drilldowns query parameter of the HTTP Client
dataset, which means the State parameter value will
overwrite the Nation parameter value you have defined
previously. Instead of retrieving nation statistics per year, state statistics
per year will be retrieved.
-
Click Save to
save your configuration.
-
Now go to the Dataset tab of the Nation stats
table destination to add and assign a variable:
-
Click the icon next to the Table name parameter to open
the [Add a variable] window.
-
Give a name to your variable, State_table for
example.
-
Enter the variable value that will overwrite the default table,
State_statistics here.
-
Enter a description if needed and click Add.
-
Now that your variable is created, you are redirected to the
[Assign a variable] window that lists all context
variables. Select yours and click Assign.
Your variable and its value are assigned to the
Table
name parameter of the BigQuery dataset, which means the
State table will overwrite the
Nation table name you have defined previously.
Instead of inserting data into the
Nation table, data
will be inserted into the
State table.
-
Click Save to
save your configuration.
-
On the top toolbar of Talend Cloud Pipeline Designer,
click the Run button to open the panel allowing you to select
your run profile.
-
Select your run profile in the list (for more information, see Run profiles), then click Run to
run your pipeline.
Results
Your pipeline is being executed, the data is filtered and corresponds to the
context variable you have assigned to the source and destination datasets:
- In the pipeline execution logs, you can see the context variables used to
retrieve US State data and create the State table on
BigQuery at execution time. 312 records are inserted into the new table.
- In your Google BigQuery account, you can see the newly created
State_statistics table that is filled with the filtered
data (only State data collected after 2015).