tSQLTemplateAggregate - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Function

tSQLTemplateAggregate collects data values from one or more columns with the intent to manage the collection as a single unit. This component has real-time capabilities since it runs the data transformation on the DBMS itself.

Purpose

Helps to provide a set of matrix based on values or calculations.

tSQLTemplateAggregate properties

Component family

ELT/SQLTemplate

 

Basic settings

Database Type

Select the database type you want to connect to from the list.

 

Component List

Select the relevant DB connection component in the list if you use more than one connection in the current Job.

 

Database name

Name of the database.

 

Source table name

Name of the table holding the data you want to collect values from.

 

Target table name

Name of the table you want to write the collected and transformed data in.

 

Schema and Edit schema

A schema is a row description, that is to say, it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.

 

 

Built-in: You create and store the schema locally for this component only. Related topic: see Talend Studio User Guide.

 

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job flowcharts. Related topic: see Talend Studio User Guide.

 

Operations

Select the type of operation along with the value to use for the calculation and the output field.

 

 

Output Column: Select the destination field in the list.

 

 

Function: Select any of the following operations to perform on data: count, min, max, avg, sum, and count (distinct).

 

 

Input column position: Select the input column from which you want to collect the values to be aggregated.

 

Group by

Define the aggregation sets, the values of which will be used for calculations.

 

 

Output Column: Select the column label in the list offered according to the schema structure you defined. You can add as many output columns as you wish to make more precise aggregations.

 

 

Input Column position: Match the input column label with your output columns, in case the output label of the aggregation set needs to be different.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

SQL Template

SQL Template List

To add a default system SQL template: Click the Add button to add the default system SQL template(s) in the SQL Template List.

Click in the SQL template field and then click the arrow to display the system SQL template list. Select the desired system SQL template provided by Talend.

Note: You can create your own SQL template and add them to the SQL Template List.

To create a user-defined SQL template:

-Select a system template from the SQL Template list and click on its code in the code box. You will be prompted by the system to create a new template.

-Click Yes to open the SQL template wizard.

-Define your new SQL template in the corresponding fields and click Finish to close the wizard. An SQL template editor opens where you can enter the template code.

-Click the Add button to add the new created template to the SQL Template list.

For more information, see Talend Studio User Guide.

Global Variables

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

QUERY: the query statement being processed. This is a Flow variable and it returns a string.

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is used as an intermediate component with other relevant DB components, especially the DB connection and commit components.

Log4j

If you are using a subscription-based version of the Studio, the activity of this component can be logged using the log4j feature. For more information on this feature, see Talend Studio User Guide.

For more information on the log4j logging levels, see the Apache documentation at http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html.

Limitation

n/a

Scenario: Filtering and aggregating table columns directly on the DBMS

The following scenario creates a Job that opens a connection to a Mysql database and:

  • instantiates the schemas from a database table whose rows match the column names specified in the filter,

  • filters a column in the same database table to have only the data that matches a WHERE clause,

  • collects data grouped by specific value(s) from the filtered column and writes aggregated data in a target database table.

To filter and aggregate database table columns:

  • Drop the following components from the Palette onto the design workspace: tELTMysqlconnection, tSQLTemplateFilterColumns, tSQLTemplateFilterRows, tSQLTemplateAggregate, tSQLTemplateCommit, and tSQLTemplateRollback.

  • Connect the five first components using OnComponentOk links.

  • Connect tSQLTemplateAggregate to tSQLTemplateRollback using an OnComponentError link.

  • In the design workspace, select tMysqlConnection and click the Component tab to define the basic settings for tMysqlConnection.

  • In the Basic settings view, set the database connection details manually or select Repository from the Property Type list and select your DB connection if it has already been defined and stored in the Metadata area of the Repository tree view.

For more information about Metadata, see Talend Studio User Guide.

  • In the design workspace, select tSQLTemplateFilterColumns and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Enter the names for the database, source table, and target table in the corresponding fields and click the three-dot buttons next to Edit schema to define the data structure in the source and target tables.

Note

When you define the data structure for the source table, column names automatically appear in the Column list in the Column filters panel.

In this scenario, the source table has five columns: id, First_Name, Last_Name, Address, and id_State.

  • In the Column filters panel, set the column filter by selecting the check boxes of the columns you want to write in the source table.

In this scenario, the tSQLTemplateFilterColumns component instantiates only three columns: id, First_Name, and id_State from the source table.

Note

In the Component view, you can click the SQL Template tab and add system SQL templates or create your own and use them within your Job to carry out the coded operation. For more information, see tSQLTemplateFilterColumns Properties.

  • In the design workspace, select tSQLTemplateFilterRows and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Enter the names for the database, source table, and target table in the corresponding fields and click the three-dot buttons next to Edit schema to define the data structure in the source and target tables.

In this scenario, the source table has the three initially instantiated columns: id, First_Name, and id_State and the source table has the same three-column schema.

  • In the Where condition field, enter a WHERE clause to extract only those records that fulfill the specified criterion.

In this scenario, the tSQLTemplateFilterRows component filters the First_Name column in the source table to extract only the first names that contain the "a" letter.

  • In the design workspace, select tSQLTemplateAggregate and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Enter the names for the database, source table, and target table in the corresponding fields and click the three-dot buttons next to Edit schema to define the data structure in the source and target tables.

The schema for the source table consists of the three columns: id, First_Name, and id_State. The schema for the target table consists of two columns: customers_status and customers_number. In this scenario, we want to group customers by their marital status and count customer number in each marital group. To do that, we define the Operations and Group by panels accordingly.

  • In the Operations panel, click the plus button to add one or more lines and then click in the Output column line to select the output column that will hold the counted data.

  • Click in the Function line and select the operation to be carried on.

  • In the Group by panel, click the plus button to add one or more lines and then click in the Output column line to select the output column that will hold the aggregated data.

  • In the design workspace, select tSQLTemplateCommit and click the Component tab to define its basic settings.

  • On the Database type list, select the relevant database.

  • On the Component list, select the relevant database connection component if more than one connection is used.

  • Do the same for tSQLTemplateRollback.

  • Save your Job and press F6 to execute it.

A two-column table aggregate_customers is created in the database. It groups customers according to their marital status and count customer number in each marital group.