tDBInput - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Note that this component is incompatible with JVM 1.8 and is therefore deprecated and hidden from the Palette by default. To use this component, you need to install JVM 1.7 or an earlier version. For information about how to show a hidden component on the Palette, see Talend Studio User Guide.

tDBInput properties

Component family

Databases/DB Generic

 

Function

tDBInput reads a database and extracts fields based on a query.

Note

To use this component, relevant DBMSs' ODBC drivers should be installed and the corresponding ODBC connections should be configured via the database connection configuration wizard.

Purpose

tDBInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link.

Note

For performance reasons, a specific Input component (e.g.: tMySQLInput for MySQL database) should always be preferred to the generic component.

Basic settings

Property type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: No property data stored centrally.

 

 

Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. 

 

Click this icon to open the database connection configuration wizard and store the database connection parameters you set in the component Basic settings view.

For more information about setting up and storing database connection parameters, see Talend Studio User Guide.

 

Database

Name of the data source defined via the database connection configuration wizard.

 

Username and Password

DB user authentication data.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

 

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. The schema is either Built-In or stored remotely in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Note

This component offers the advantage of the dynamic schema feature. This allows you to retrieve unknown columns from source files or to copy batches of columns from a source without mapping each column individually. For further information about dynamic schemas, see Talend Studio User Guide.

 

 

Built-In: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: see Talend Studio User Guide.

  

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

Table Name

Name of the source table where changes made to data should be captured.

 

Query type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: Fill in manually the query statement or build it graphically using SQLBuilder

 

 

Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

 

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the database connection you are creating.

Note

You can set the encoding parameters through this field.

 

Trim all the String/Char columns

Select this check box to remove leading and trailing whitespace from all the String/Char columns.

 

Trim column

Remove leading and trailing whitespace from defined columns.

 

tStatCatcher Statistics

Select this check box to collect log data at the component level.

 

Enable parallel execution

Select this check box to perform high-speed data processing, by treating multiple data flows simultaneously. Note that this feature depends on the database or the application ability to handle multiple inserts in parallel as well as the number of CPU affected. In the Number of parallel executions field, either:

  • Enter the number of parallel executions desired.

  • Press Ctrl + Space and select the appropriate context variable from the list. For further information, see Talend Studio User Guide.

Warning

  • The Action on table field is not available with the parallelization function. Therefore, you must use a tCreateTable component if you want to create a table.

  • When parallel execution is enabled, it is not possible to use global variables to retrieve return values in a subjob.

Global Variables 

NB_LINE: the number of rows processed. This is an After variable and it returns an integer.

QUERY: the SQL query statement being processed. This is a Flow variable and it returns a string.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component offers the flexibility of the DB query and covers all possible SQL queries using a generic ODBC connection.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Scenario 1: Displaying selected data from DB table

The following scenario creates a two-component Job, reading data from a database using a DB query and outputting delimited data into the standard output (console).

Warning

As a prerequisite of this Job, the MySQL ODBC driver must have been installed and the corresponding ODBC connection must have been configured.

  1. Drop a tDBInput and tLogRow component from the Palette to the design workspace.

  2. Connect the components using Row > Main link.

  3. Double-click tDBInput to open its Basic settings view in the Component tab.

  4. Fill in the database name, the username and password in the corresponding fields.

  5. Click Edit Schema and create a 2-column description including shop code and sales.

  6. Enter the table name in the corresponding field.

  7. Type in the query making sure it includes all columns in the same order as defined in the Schema. In this case, as we'll select all columns of the schema, the asterisk symbol makes sense.

  8. Click on the second component to define it.

  9. Enter the fields separator. In this case, a pipe separator.

  10. Now go to the Run tab, and click on Run to execute the Job.

    The DB is parsed and queried data is extracted from the specified table and passed on to the job log console. You can view the output file straight on the console.

    For an example of the use of Dynamic Schemas in Input components, see Scenario 4: Writing dynamic columns from a MySQL database to an output file.

Scenario 2: Using StoreSQLQuery variable

StoreSQLQuery is a variable that can be used to debug a tDBInput scenario which does not operate correctly. It allows you to dynamically feed the SQL query set in your tDBInput component.

  1. Use the same scenario as scenario 1 above and add a third component, tJava.

  2. Connect tDBInput to tJava using a trigger connection of the OnComponentOk type. In this case, we want the tDBInput to run before the tJava component.

  3. Set both tDBInput and tLogRow component as in tDBInput scenario 1.

  4. Click anywhere on the design workspace to display the Contexts property panel.

  5. Create a new parameter called explicitly StoreSQLQuery. Enter a default value of 1. This value of 1 means the StoreSQLQuery is "true" for a use in the QUERY global variable.

  6. Click on the tJava component to display the Component view. Enter the System.Out.println("")command to display the query content, press Ctrl+Space bar to access the variable list and select the global variable QUERY.

  7. Go to your Run tab and execute the Job.

  8. The query entered in the tDBInput component shows at the end of the job results, on the log:

    For an example of the use of dynamic schemas in Input components, see Scenario 4: Writing dynamic columns from a MySQL database to an output file.