Scenario: Mapping data using a simple implicit join - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, a tELTGreenplumMap component is deployed to retrieve the data from the source table employee_by_statecode, compares its statecode column against the table statecode, and then maps the desired columns from the two tables to the output table employee_by_state.

Before the Job execution, the three tables, employee_by_statecode, statecode and employee_by_state look like:

Dropping components

  1. Drop tGreenplumConnection, tELTGreenplumInput (two), tELTGreenplumMap, tELTGreenplumOutput, tGreenplumCommit, tGreenplumInput and tLogRow from the Palette onto the workspace.

  2. Rename tGreenplumConnection as connect_to_greenplum_host, two tELTGreenplumInput components as employee+statecode and statecode, tELTGreenplumMap as match+map, tELTGreenplumOutput as map_data_output, tGreenplumCommit as commit_to_host, tGreenplumInput as read_map_output_table and tLogRow as show_map_data.

  3. Link tGreenplumConnection to tELTGreenplumMap using an OnSubjobOk trigger.

    Link tELTGreenplumMap to tGreenplumCommit using an OnSubjobOk trigger.

    Link tGreenplumCommit to tGreenplumInput using an OnSubjobOk trigger.

  4. Link tGreenplumInput to tLogRow using a Row > Main connection.

    The two tELTGreenplumInput components and tELTGreenplumOutput will be linked to tELTGreenplumMap later once the relevant tables have been defined.

Configuring the components

  1. Double-click tGreenplumConnection to open its Basic settings view in the Component tab.

    In the Host and Port fields, enter the context variables for the Greenplum server.

    In the Database field, enter the context variable for the Greenplum database.

    In the Username and Password fields, enter the context variables for the authentication credentials.

    For more information on context variables, see Talend Studio User Guide.

  2. Double-click employee+statecode to open its Basic settings view in the Component tab.

    In the Default table name field, enter the name of the source table, namely employee_by_statecode.

    Click the [...] button next to the Edit schema field to open the schema editor.

    Click the [+] button to add three columns, namely id, name and statecode, with the data type as INT4, VARCHAR, and INT4 respectively.

    Click OK to close the schema editor.

    Link employee+statecode to tELTGreenplumMap using the output employee_by_statecode.

  3. Double-click statecode to open its Basic settings view in the Component tab.

    In the Default table name field, enter the name of the lookup table, namely statecode.

  4. Click the [...] button next to the Edit schema field to open the schema editor.

    Click the [+] button to add two columns, namely state and statecode, with the data type as VARCHAR and INT4 respectively.

    Click OK to close the schema editor.

    Link statecode to tELTGreenplumMap using the output statecode.

  5. Click tELTGreenplumMap to open its Basic settings view in the Component tab.

    Select the Use an existing connection check box.

  6. Click the [...] button next to the ELT Greenplum Map Editor field to open the map editor.

  7. Click the [+] button on the upper left corner to open the table selection box.

    Select tables employee_by_statecode and statecode in sequence and click Ok.

    The tables appear on the left panel of the editor.

  8. On the upper right corner, click the [+] button to add an output table, namely employee_by_state.

    Click Ok to close the map editor.

  9. Double-click tELTGreenplumOutput to open its Basic settings view in the Component tab.

    In the Default table name field, enter the name of the output table, namely employee_by_state.

  10. Click the [...] button next to the Edit schema field to open the schema editor.

    Click the [+] button to add three columns, namely id, name and state, with the data type as INT4, VARCHAR, and VARCHAR respectively.

    Click OK to close the schema editor.

    Link tELTGreenplumMap to tELTGreenplumOutput using the table output employee_by_state.

    Click OK on the pop-up window below to retrieve the schema of tELTGreenplumOutput.

    Now the map editor's output table employee_by_state shares the same schema as that of tELTGreenplumOutput.

  11. Double-click tELTGreenplumMap to open the map editor.

    Drop the column statecode from table employee_by_statecode to its counterpart of the table statecode, looking for the records in the two tables that have the same statecode values.

    Drop the columns id and name from table employee_by_statecode as well as the column statecode from table statecode to their counterparts in the output table employee_by_state.

    Click Ok to close the map editor.

  12. Double-click tGreenplumInput to open its Basic settings view in the Component tab.

    Select the Use an existing connection check box.

    In the Table name field, enter the name of the source table, namely employee_by_state.

    In the Query field, enter the query statement, namely "SELECT * FROM \"employee_by_state\"".

  13. Double-click tLogRow to open its Basic settings view in the Component tab.

    In the Mode area, select Table (print values in cells of a table for a better display.

Executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to run the Job.

    As shown above, the desired employee records have been written to the table employee_by_state, presenting clearer geographical information about the employees.