Scenario: Writing data in BigQuery - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Studio
task
Data Governance
Data Quality and Preparation
Design and Development

This scenario uses two components to write data in Google BigQuery.

Linking the components

  1. In the Integration perspective of Talend Studio, create an empty Job, named WriteBigQuery for example, from the Job Designs node in the Repository tree view.

    For further information about how to create a Job, see the Talend Studio User Guide.

  2. Drop tRowGenerator and tBigQueryOutput onto the workspace.

    The tRowGenerator component generates the data to be transferred to Google BigQuery in this scenario. In the real-world case, you can use other components such as tMysqlInput or tMap in the place of tRowGenerator to design a sophisticated process to prepare your data to be transferred.

  3. Connect them using the Row > Main link.

Preparing the data to be transferred

  1. Double-click tRowGenerator to open its Component view.

  2. Click RowGenerator Editor to open the editor.

  3. Click three times to add three rows in the Schema table.

  4. In the Column column, enter the name of your choice for each of the new rows. For example, fname, lname and States.

  5. In the Functions column, select TalendDataGenerator.getFirstName for the fname row, TalendDataGenerator.getLastName for the lname row and TalendDataGenerator.getUsState for the States row.

  6. In the Number of Rows for RowGenerator field, enter, for example, 100 to define the number of rows to be generated.

  7. Click OK to validate these changes.

Configuring the access to BigQuery and Cloud Storage

Building access to BigQuery

  1. Double-click tBigQueryOutput to open its Component view.

  2. Click Sync columns to retrieve the schema from its preceding component.

  3. In the Local filename field, enter the directory where you need to create the file to be transferred to BigQuery.

  4. Navigate to the Google APIs Console in your web browser to access the Google project hosting the BigQuery and the Cloud Storage services you need to use.

  5. Click the API Access tab to open its view.

  6. In the Component view of the Studio, paste Client ID, Client secret and Project ID from the API Access tab view to the corresponding fields, respectively.

  7. In the Dataset field, enter the dataset you need to transfer data in. In this scenario, it is documentation.

    This dataset must exist in BigQuery. The following figure shows the dataset used by this scenario.

  8. In the Table field, enter the name of the table you need to write data in, for example, UScustomer.

    If this table does not exist in BigQuery you are using, select Create the table if it doesn't exist.

  9. In the Action on data field, select the action. In this example, select Truncate to empty the contents, if there are any, of target table and to repopulate it with the transferred data.

Building access to Cloud Storage

  1. Navigate to the Google APIs Console in your web browser to access the Google project hosting the BigQuery and the Cloud Storage services you need to use.

  2. Click Google Cloud Storage > Interoperable Access to open its view.

  3. In the Component view of the Studio, paste Access key, Access secret from the Interoperable Access tab view to the corresponding fields, respectively.

  4. In the Bucket field, enter the path to the bucket you want to store the transferred data in. In this example, it is talend/documentation

    This bucket must exist in the directory in Cloud Storage

  5. In the File field, enter the directory where in Google Clould Storage you receive and create the file to be transferred to BigQuery. In this example, it is gs://talend/documentation/biquery_UScustomer.csv. The file name must be the same as the one you defined in the Local filename field.

    Note

    Troubleshooting: if you encounter issues such as Unable to read source URI of the file stored in Google Cloud Storage, check whether you put the same file name in these two fields.

  6. Enter 0 in the Header field to ignore no rows in the transferred data.

Getting Authorization code

  1. In the Run view of Talend Studio, click Run to execute this Job. The execution will pause at a given moment to print out in the console the URL address used to get the authorization code.

  2. Navigate to this address in your web browser and copy the authorization code displayed.

  3. In the Component view of tBigQueryOutput, paste the authorization code in the Authorization Code field.

Executing the Job

  • Press F6.

Once done, the Run view is opened automatically, where you can check the execution result.

The data is transferred to Google BigQuery.