Using context variables to use different datasets at execution time - Cloud

Talend Cloud Pipeline Designer User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Pipeline Designer
Content
Administration and Monitoring > Monitoring executions
Administration and Monitoring > Monitoring logs
Data Governance > Filtering data
Data Quality and Preparation > Filtering data
Data Quality and Preparation > Managing datasets
Deployment > Deploying > Executing Pipelines
Design and Development > Designing Pipelines
Last publication date
2024-02-09

In this scenario context variables are added to override both datasets that are used as source and destination at execution time.

A pipeline shows an HTTP client dataset with a context variable as the pipeline source, a Filter processor, and a BigQuery dataset with a context variable as the pipeline destination.

Before you begin

  • You have previously created a connection to the system storing your source data, here an HTTP Client connection.

    The Base URL of the connection is: https://datausa.io/

  • You have previously added the dataset holding your source data.

    Here, United States public data including population statistics.

    The HTTP Client dataset properties are:
    • HTTP method: GET
    • Path: /api/data
    • Query parameters: Name: drilldowns Value: Nation; Name: measures Value: Population
    • Response body format: JSON
    • Extract a sub-part of the response: .data
    • Returned content: Body
  • You also have created the destination connection, here a Google BigQuery connection and a BigQuery dataset named Nation_statistics. This BigQuery table will be created at execution time and will contain US statistics per year.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Filter US population stats on year >=2015
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here get US stats.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Filter processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor; filter on year >= 2015 for example.
  7. In the Filter area:
    1. Select .ID_Year in the Input area, as you want to filter the records corresponding to the year in which data has been collected.
    2. Select None in the Optionally select a function to apply list, >= in the Operator list and type in 2015 in the Value list as you want to filter on statistics collected after the year 2015.
  8. Click Save to save your configuration.

    You can see that the records are filtered and 6 records meet the criteria you have defined.

    The preview panel shows the input data before the filtering operation, and the output data after the filtering operation.
  9. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the BigQuery table that will hold your filtered data.
  10. Give a meaningful name to the Destination; Nation stats table for example and select Create table if not exists in the Table operation list, as you want to create the Nation_statistics table and insert data in it at execution time.
  11. (Optional) If you execute your pipeline at this stage, you will see in the logs that the filtered records were passed according to the filter you defined and you will see the new Nation_statistics table created in your Google BigQuery account. This new table contains the 6 filtered records with the statistics collected in the US.
    The BigQuery table named 'Nation_statistics' created at runtime displays 6 records related to the United States statistics.
  12. Go back to the Dataset tab of the US data - stats source to add and assign a variable:
    In the Configuration panel of the HTTP client source, the X icon that allows you to add context variables is highlighted next to the 'Nation' value.
    1. In the Query parameters, click the Context variable icon next to the Value parameter of the drilldowns to open the [Add variable] window.
    2. Give a name to your variable, State statistics for example.
    3. Enter the variable value that will overwrite the default resource to be retrieved, State here.
    4. Enter a description if needed and click Add.
    5. Now that your variable is created, you are redirected to the [Assign a variable] window that lists all context variables. Select yours and click Assign.
      Your variable and its value are assigned to the drilldowns query parameter of the HTTP Client dataset, which means the State parameter value will overwrite the Nation parameter value you have defined previously. Instead of retrieving nation statistics per year, state statistics per year will be retrieved.
    6. Click Save to save your configuration.
  13. Now go to the Dataset tab of the Nation stats table destination to add and assign a variable:
    In the Configuration panel of the BigQuery destination, the X icon that allows you to add context variables is highlighted next to the 'Nation_statistics' value.
    1. Click the Context variable icon next to the Table name parameter to open the [Add a variable] window.
    2. Give a name to your variable, State_table for example.
    3. Enter the variable value that will overwrite the default table, State_statistics here.
    4. Enter a description if needed and click Add.
    5. Now that your variable is created, you are redirected to the [Assign a variable] window that lists all context variables. Select yours and click Assign.
      In the 'Assign a variable' window, the new variable is selected and the 'Assign' button is enabled.
      Your variable and its value are assigned to the Table name parameter of the BigQuery dataset, which means the State table will overwrite the Nation table name you have defined previously. Instead of inserting data into the Nation table, data will be inserted into the State table.
    6. Click Save to save your configuration.
  14. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  15. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the data is filtered and corresponds to the context variable you have assigned to the source and destination datasets:
  • In the pipeline execution logs, you can see the context variables used to retrieve US State data and create the State table on BigQuery at execution time. 312 records are inserted into the new table.
    The Logs panel indicates that 312 records have been produced, and that the context variables used to retrieve US State data and create the State table on BigQuery have been applied at runtime.
  • In your Google BigQuery account, you can see the newly created State_statistics table that is filled with the filtered data (only State data collected after 2015).
    The BigQuery table named 'State_statistics' created at runtime displays all records related to the States statistics.