Using context variables to use different datasets at execution time - Cloud

Talend Cloud Pipeline Designer User Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Pipeline Designer
task
Administration and Monitoring > Monitoring executions
Administration and Monitoring > Monitoring logs
Data Governance > Filtering data
Data Quality and Preparation > Filtering data
Data Quality and Preparation > Managing datasets
Deployment > Deploying > Executing Pipelines
Design and Development > Designing Pipelines

In this scenario context variables are added to override both datasets that are used as source and destination at execution time.

Before you begin

  • You have previously created a connection to the system storing your source data, here a REST connection.

    The Base URL of the connection is: https://api.covid19api.com

  • You have previously added the dataset holding your source data.

    Here, data about the daily total number of Covid cases per country including country information, confirmed deaths, recovered cases, etc.

    The REST dataset properties are:
    • Resource: /total/country/germany
    • HTTP method: GET
    • Answer body format: JSON
  • You also have created the destination connection, here a Google BigQuery connection and a BigQuery dataset named Germany. This BigQuery table will be created at execution time and will contain daily reports per country.

Procedure

  1. Click ADD PIPELINE on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Filter Covid data on recovered cases per country
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here covid country data.
  4. Select your dataset and click SELECT in order to add it to the pipeline.
    Rename it if needed.
  5. Click and add a Filter processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor; filter on recovered cases for example.
  7. In the Filter area:
    1. Select .Recovered in the Input area, as you want to filter the records corresponding to the number of people who recovered from the disease.
    2. Select NONE in the Optionally select a function to apply list, > in the Operator list and type in 0 in the Value list as you want to filter on Germany Covid cases with at least one recovered case.
  8. Click SAVE to save your configuration.

    You can see that the records are filtered and only 28 records meet the criteria you have defined. Note that only the 50 records shown in the sample are filtered, more records will be processed at execution time.

  9. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the Germany BigQuery table that will hold your filtered data.
  10. Give a meaningful name to the Destination; daily report for example and select Create table if not exists in the Table operation list, as you want to create the Germany table and insert data in it at execution time.
  11. (Optional) If you execute your pipeline at this stage, you will see in the logs that the filtered records were passed according to the filter you defined and you will see the new Germany table created in your Google BigQuery account. This new table contains the filtered data with the recovered cases collected in Germany.
  12. Go back to the Dataset tab of the covid country data source to add and assign a variable:
    1. Click the icon next to the Resource parameter to open the [Add a variable] window.
    2. Give a name to your variable, France data for example.
    3. Enter the variable value that will overwrite the default resource to be retrieved, /total/country/france here.
    4. Enter a description if needed and click ADD.
    5. Now that your variable is created, you are redirected to the [Assign a variable] window that lists all context variables. Select yours and click ASSIGN.
      Your variable and its value are assigned to the Resource parameter of the REST dataset, which means the /total/country/france resource will overwrite the /total/country/germany resource you have defined previously. Instead of retrieving Germany Covid data, France data will be retrieved.
    6. Click SAVE to save your configuration.
  13. Now go to the Dataset tab of the daily report destination to add and assign a variable:
    1. Click the icon next to the Table name parameter to open the [Add a variable] window.
    2. Give a name to your variable, France table for example.
    3. Enter the variable value that will overwrite the default table, France here.
    4. Enter a description if needed and click ADD.
    5. Now that your variable is created, you are redirected to the [Assign a variable] window that lists all context variables. Select yours and click ASSIGN.
      Your variable and its value are assigned to the Table name parameter of the BigQuery dataset, which means the France table will overwrite the Germany resource you have defined previously. Instead of inserting data into the Germany table, data will be inserted into the France table.
    6. Click SAVE to save your configuration.
  14. On the top toolbar of Talend Cloud Pipeline Designer, select your run profile in the list (for more information, see Run profiles).
  15. Click the run icon to run your pipeline.

Results

Your pipeline is being executed, the data is filtered and corresponds to the context variable you have assigned to the source and destination datasets:
  • In the pipeline execution logs, you can see the context variables used to retrieve the France REST resource and create the France table on BigQuery at execution time.
  • In your Google BigQuery account, you can see the newly created France table that is filled with the filtered data (only recovered cases collected in France).