Scenario: Mapping data using a subquery - 6.1

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that maps the data from two input tables PreferredSubject and CourseScore to the output table TotalScoreOfPreferredSubject using a subquery.

The PreferredSubject table contains the student's preferred subject data. To reproduce this scenario, you can load the data to the table from a CSV file like the following. For how to load data to a Teradata table, see Scenario: Loading data into a Teradata database.

SeqID;StuName;Subject;Detail
1;Amanda;art;Amanda prefers art.
2;Ford;science;Ford prefers science.
3;Kate;art;Kate prefers art.

The CourseScore table contains the student's subject score data. To reproduce this scenario, you can load the data to the table from a CSV file like the following. For how to load data to a Teradata table, see Scenario: Loading data into a Teradata database.

SeqID;StuName;Subject;Course;Score;Detail
1;Amanda;science;math;85;science score
2;Amanda;science;physics;75;science score
3;Amanda;science;chemistry;80;science score
4;Amanda;art;chinese;85;art score
5;Amanda;art;history;95;art score
6;Amanda;art;geography;80;art score
7;Ford;science;math;95;science score
8;Ford;science;physics;85;science score
9;Ford;science;chemistry;80;science score
10;Ford;art;chinese;75;art score
11;Ford;art;history;80;art score
12;Ford;art;geography;85;art score
13;Kate;science;math;65;science score
14;Kate;science;physics;75;science score
15;Kate;science;chemistry;80;science score
16;Kate;art;chinese;85;art score
17;Kate;art;history;80;art score
18;Kate;art;geography;95;art score

Before the Job execution, there is no data in the output table TotalScoreOfPreferredSubject:

SeqID;StuName;PreferredSubject;TotalScore

Dropping and renaming the components

  1. Create a new Job and add the following components by typing their names in the design workspace or dropping them from the Palette: two tELTTeradataInput components, two tELTTeradataMap components, and one tELTTeradataOutput component.

  2. Rename two tELTTeradataInput components to PreferredSubject and CourseScore respectively, two tELTTeradataMap components to ELTSubqueryMap and ELTMap respectively, and the tELTTeradataOutput component to TotalScoreOfPreferredSubject.

Configuring the input components

  1. Double-click PreferredSubject to open its Basic settings view.

  2. In the Default Table Name field, enter an input table name. In this example, it is PreferredSubject.

  3. Click the [...] button next to Edit schema to define the schema of the input table PreferredSubject in the schema editor.

    Click the [+] button to add four columns, SeqID with the DB Type set to INTEGER, StuName, Subject, and Detail with the DB Type set to VARCHAR.

    Click OK to validate these changes and close the schema editor.

  4. Connect PreferredSubject to ELTMap using the Link > PreferredSubject (Table) link.

  5. Double-click CourseScore to open its Basic settings view.

  6. In the Default Table Name field, enter an input table name. In this example, it is CourseScore.

  7. Click the [...] button next to Edit schema to define the schema of the input table CourseScore in the schema editor.

    Click the [+] button to add six columns, SeqID and Score with the DB Type set to INTEGER, StuName, Subject, Course and Detail with the DB Type set to VARCHAR.

    Click OK to validate these changes and close the schema editor.

  8. Connect CourseScore to ELTSubqueryMap using the Link > CourseScore (Table) link.

Configuring the output component

  1. Double-click TotalScoreOfPreferredSubject to open its Basic settings view.

  2. In the Default Table Name field, enter an output table name. In this example, it is TotalScoreOfPreferredSubject.

  3. Click the [...] button next to Edit schema to define the schema of the output table in the schema editor.

    Click the [+] button to add four columns, SeqID and TotalScore with the DB Type set to INTEGER, StuName and PreferredSubject with the DB Type set to VARCHAR.

    Click OK to validate these changes and close the schema editor.

Configuring data mapping to generate a subquery

  1. Click ELTSubqueryMap to open its Basic settings view.

    Note that you do not need to specify the Teradata database connection information in the ELTSubqueryMap component. The connection information will be specified in the ELTMap component.

  2. Click the [...] button next to ELT Teradata Map Editor to open its map editor.

  3. Add the input table CourseScore by clicking the [+] button in the upper left corner of the map editor and then selecting the relevant table name from the drop-down list in the pop-up dialog box.

  4. Add an output table by clicking the [+] button in the upper right corner of the map editor and then entering the table name TotalScore in the corresponding field in the pop-up dialog box.

  5. Drag StuName, Subject, and Score columns in the input table and then drop them to the output table.

  6. Click the Add filter row button in the upper right corner of the output table and select Add an other(GROUP...) clause from the pop-up menu. Then in the Additional other clauses (GROUP/ORDER BY...) field displayed, enter the clause GROUP BY CourseScore.StuName, CourseScore.Subject.

    Add the aggregate function SUM for the column Score of the output table by changing the expression of this column to SUM(CourseScore.Score).

  7. Click the Generated SQL Select query for "table1" output tab at the bottom of the map editor to display the corresponding generated SQL statement.

    This SQL query will appear as a subquery in the SQL query generated by the ELTMap component.

  8. Click OK to validate these changes and close the map editor.

  9. Connect ELTSubqueryMap to ELTMap using the Link > TotalScore (table1) link. Note that the link is renamed automatically to TotalScore (Table_ref) since the output table TotalScore here is a reference table.

Mapping the input and output schemas

  1. Right-click ELTMap, select Link > *New Output* (Table) from the contextual menu and click TotalScoreOfPreferredSubject. In the pop-up dialog box, click Yes to get the schema from the target component.

  2. Click ELTMap to open its Basic settings view.

    Fill in the Host, Database, Username, and Password fields with the Teradata database connection information.

  3. Click the [...] button next to ELT Teradata Map Editor to open its map editor.

  4. Add the input table PreferredSubject by clicking the [+] button in the upper left corner of the map editor and selecting the relevant table name from the drop-down list in the pop-up dialog box.

    Do the same to add another input table TotalScore.

  5. Drag the StuName column in the input table PreferredSubject and drop it to the corresponding column in the input table TotalScore. Then select the Explicit join check box for the StuName column in the input table TotalScore.

    Do the same for the Subject column.

  6. Drag the SeqID column in the input table PreferredSubject and drop it to the corresponding column in the output table.

    Do the same to drag the StuName and Subject columns in the input table PreferredSubject and the Score column in the input table TotalScore and drop them to the corresponding column in the output table.

  7. Click the Generated SQL Select query for "table2" output tab at the bottom of the map editor to display the corresponding generated SQL statement.

    The SQL query generated in the ELTSubqueryMap component appears as a subquery in the SQL query generated by this component. Alias will be automatically added for the selected columns in the subquery.

  8. Click OK to validate these changes and close the map editor.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Press F6 to run the Job.

    The select statement is generated and the mapping data are written into the output table.