tBigQueryInput - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

This component will be available in the Palette of the studio on the condition that you have subscribed to one of the Talend solutions with Big Data.

tBigQueryInput Properties

Component family

Big Data / Google BigQuery

 

Function

This component connects to Google BigQuery and performs queries in it.

Purpose

This component performed the queries supported by Google BigQuery.

Basic settings

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

 

Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

Connection

Client ID and Client secret

Paste the client ID and the client secret, both created and viewable on the API Access tab view of the project hosting the BigQuery service and the Cloud Storage service you need to use.

To enter the client secret, click the [...] button next to the client secret field, and then in the pop-up dialog box enter the client secret between double quotes and click OK to save the settings.

 

Project ID

Paste the ID of the project hosting the BigQuery service you need to use.

The default ID of this project can be found in the URL of the Google API Console, or by hovering your mouse pointer over the name of the project in the BigQuery Browser Tool.

 

Authorization code

Paste the authorization code provided by Google for the access you are building.

To obtain the authorization code, you need to execute the Job using this component and when this Job pauses execution to print out an URL address, you navigate to this address to copy the authorization code displayed.

 

Query

Enter the query you need to use.

Advanced settings

token properties File Name

Enter the path to, or browse to the refresh token file you need to use.

At the first Job execution using the Authorization code you have obtained from Google BigQuery, the value in this field is the directory and the name of that refresh token file to be created and used; if that token file has been created and you need to reuse it, you have to specify its directory and file name in this field.

With only the token file name entered, Talend Studio considers the directory of that token file to be the root of the Studio folder.

For further information about the refresh token, see the manual of Google BigQuery.

 

Advanced Separator (for number)

Select this check box to change the separator used for the numbers.

 

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for database data handling.

 

tStatCatcher Statistics

Select this check box to collect the log data at the component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This is an input component. It sends the extracted data to the component that follows it.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

N/A

Scenario: Performing a query in BigQuery

This scenario uses two components to perform the SELECT query in BigQuery and present the result in the Studio.

The following figure shows the schema of the table, UScustomer, we use as example to perform the SELECT query in.

We will select the State records and count the occurrence of each State among those records.

Linking the components

  1. In the Integration perspective of Studio, create an empty Job, named BigQueryInput for example, from the Job Designs node in the Repository tree view.

    For further information about how to create a Job, see the Talend Studio User Guide.

  2. Drop tBigQueryInput and tLogRow onto the workspace.

  3. Connect them using the Row > Main link.

Creating the query

Building access to BigQuery

  1. Double-click tBigQueryInput to open its Component view.

  2. Click Edit schema to open the editor

  3. Click the button twice to add two rows and enter the names of your choice for each of them in the Column column. In this scenario, they are: States and Count.

  4. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.

  5. Navigate to the Google APIs Console in your web browser to access the Google project hosting the BigQuery and the Cloud Storage services you need to use.

  6. Click the API Access tab to open its view.

  7. In the Component view of the Studio, paste Client ID, Client secret and Project ID from the API Access tab view to the corresponding fields, respectively.

Getting Authorization code

  1. In the Run view of the Studio, click Run to execute this Job. The execution will pause at a given moment to print out in the console the URL address used to get the authorization code.

  2. Navigate to this address in your web browser and copy the authorization code displayed.

  3. In the Component view of tBigQueryInput, paste the authorization code in the Authorization Code field.

Writing the query

  • In the Query field, enter

    select States, count (*) as Count from documentation.UScustomer group by States

Executing the Job

The tLogRow component presents the execution result of the Job. You can configure the presentation mode on its Component view.

To do this, double-click tLogRow to open the Component view and in the Mode area, select the Table (print values in cells of a table) option.

  • To execute this Job, press F6.

Once done, the Run view is opened automatically, where you can check the execution result.