tELTGreenplumMap - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The three ELT Greenplum components are closely related, in terms of their operating conditions. These components should be used to handle Greenplum DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.

Function

Helps you to build the SQL statement graphically, using the table provided as input.

Purpose

Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases.

tELTGreenplumMap properties

Component family

ELT/Map/Greenplum

Basic settings

Use an existing connection

Select this check box and in the Component List click the relevant connection component to reuse the connection details you already defined.

Note

When a Job contains the parent Job and the child Job, if you need to share an existing connection between the two levels, for example, to share the connection created by the parent Job with the child Job, you have to:

  1. In the parent level, register the database connection to be shared in the Basic settings view of the connection component which creates that very database connection.

  2. In the child level, use a dedicated connection component to read that registered database connection.

For an example about how to share a database connection across Job levels, see Talend Studio User Guide.

 

ELT Greenplum Map Editor

The ELT Map editor allows you to define the output schema and make a graphical build of the SQL statement to be executed. The column names of schema can be different from the column names in the database.

 

Style link

Select the way in which links are displayed.

Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves.

Bezier curve: Links between the schema and the Web service parameters are in the form of curve.

Line: Links between the schema and the Web service parameters are in the form of straight lines.

This option slightly optimizes performance.

 

Property type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: No property data stored centrally.

 

 

Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data.

 

Host

Database server IP address.

 

Port

Listening port number of DB server.

 

Database

Name of the database.

 

Username and Password

DB user authentication data.

To enter the password, click the [...] button next to the password field, and then in the pop-up dialog box enter the password between double quotes and click OK to save the settings.

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

 

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Dynamic settings

Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access database tables having the same data structure but in different databases, especially when you are working in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed independent of Talend Studio.

The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes unusable.

For examples on using dynamic parameters, see Scenario 3: Reading data from MySQL databases through context-based dynamic connections and Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters. For more information on Dynamic settings and context variables, see Talend Studio User Guide.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

tELTGreenplumMap is used along with tELTGreenplumInput and tELTGreenplumOutput. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name.

Note

Note that the ELT components do not handle actual data flow but only schema information.

Connecting ELT components

The ELT components do not handle any data as such but table schema information that will be used to build the SQL query to execute. Therefore the only connection required to connect these components together is a simple link.

Note that the output name you give to the link when creating it should always be the exact name of the table to be accessed as this parameter will be used in the SQL statement generated.

For the related topic, see Talend Studio User Guide.

Mapping and joining tables

In the ELT Mapper, you can select specific columns from the input schema and include them in the output schema.

  • As you would do it in the regular Map editor, simply drag & drop the columns from the input schema towards the output table defined.

  • Use the Ctrl or Shift key for multiple selection of contiguous or non-contiguous table columns.

You can implement explicit joins to retrieve various data from different tables.

  • Select the Explicit join check box for the relevant column of the input schema and then select a type of join from the join list in the upper right corner of the input schema.

  • Possible joins include: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN and CROSS JOIN. By default, the INNER JOIN is selected.

You can also create alias tables to retrieve various data from the same table.

  • In the input area, click the [+] button in the upper left corner of the map editor to create a new alias.

  • Select the table on which the alias is based.

  • Type in a new name for the alias table, preferably not the same as the main table.

Adding where and other clauses

You can also restrict the Select statement based on a Where clause and/or other clauses such as Group By, Order By, etc. by clicking the Add filter row button at the top of the output table in the map editor.

To add a restriction based on a Where clause, click the Add filter row button and select Add a WHERE clause from the pop-up menu.

To add a restriction based on Group By, Order By etc., click the Add filter row button and select Add an other(GROUP...) clause from the pop-up menu.

Make sure that all input components are linked correctly to the ELT Map component to be able to implement all inclusions, joins and clauses.

Generating the SQL statement

The mapping of elements from the input schemas to the output schemas creates instantly the corresponding SELECT statement. The clauses are also included automatically.

Scenario: Mapping data using a simple implicit join

In this scenario, a tELTGreenplumMap component is deployed to retrieve the data from the source table employee_by_statecode, compares its statecode column against the table statecode, and then maps the desired columns from the two tables to the output table employee_by_state.

Before the Job execution, the three tables, employee_by_statecode, statecode and employee_by_state look like:

Dropping components

  1. Drop tGreenplumConnection, tELTGreenplumInput (two), tELTGreenplumMap, tELTGreenplumOutput, tGreenplumCommit, tGreenplumInput and tLogRow from the Palette onto the workspace.

  2. Rename tGreenplumConnection as connect_to_greenplum_host, two tELTGreenplumInput components as employee+statecode and statecode, tELTGreenplumMap as match+map, tELTGreenplumOutput as map_data_output, tGreenplumCommit as commit_to_host, tGreenplumInput as read_map_output_table and tLogRow as show_map_data.

  3. Link tGreenplumConnection to tELTGreenplumMap using an OnSubjobOk trigger.

    Link tELTGreenplumMap to tGreenplumCommit using an OnSubjobOk trigger.

    Link tGreenplumCommit to tGreenplumInput using an OnSubjobOk trigger.

  4. Link tGreenplumInput to tLogRow using a Row > Main connection.

    The two tELTGreenplumInput components and tELTGreenplumOutput will be linked to tELTGreenplumMap later once the relevant tables have been defined.

Configuring the components

  1. Double-click tGreenplumConnection to open its Basic settings view in the Component tab.

    In the Host and Port fields, enter the context variables for the Greenplum server.

    In the Database field, enter the context variable for the Greenplum database.

    In the Username and Password fields, enter the context variables for the authentication credentials.

    For more information on context variables, see Talend Studio User Guide.

  2. Double-click employee+statecode to open its Basic settings view in the Component tab.

    In the Default table name field, enter the name of the source table, namely employee_by_statecode.

    Click the [...] button next to the Edit schema field to open the schema editor.

    Click the [+] button to add three columns, namely id, name and statecode, with the data type as INT4, VARCHAR, and INT4 respectively.

    Click OK to close the schema editor.

    Link employee+statecode to tELTGreenplumMap using the output employee_by_statecode.

  3. Double-click statecode to open its Basic settings view in the Component tab.

    In the Default table name field, enter the name of the lookup table, namely statecode.

  4. Click the [...] button next to the Edit schema field to open the schema editor.

    Click the [+] button to add two columns, namely state and statecode, with the data type as VARCHAR and INT4 respectively.

    Click OK to close the schema editor.

    Link statecode to tELTGreenplumMap using the output statecode.

  5. Click tELTGreenplumMap to open its Basic settings view in the Component tab.