Scenario 1: Inserting data in mother/daughter tables - 6.3

Talend Open Studio for Big Data Components Reference Guide

Talend Open Studio for Big Data
Data Governance
Data Quality and Preparation
Design and Development
Talend Studio

The following Job is dedicated to advanced database users, who want to carry out multiple table insertions using a parent table id to feed a child table.

As a prerequisite to this Job, follow the steps described below to create the relevant tables using an engine such as innodb:

  1. In a command line editor, connect to your Mysql server.

  2. Once connected to the relevant database, type in the following command to create the parent table:

    create table f1090_mum(id int not null auto_increment, name varchar(10), primary key(id)) engine=innodb;
  3. Then create the second table:

    create table f1090_baby (id_baby int not null, years int) engine=innodb;

Back in Talend Studio, the Job requires seven components including tMysqlConnection and tMysqlCommit.

Linking the components

  1. Drag and drop the following components from the Palette: a tFileList, a tFileInputDelimited, a tMap, a tMysqlConnection, a tMysqlCommit and two tMysqlOutput.

  2. Connect tMysqlConnection to tFileList using an OnComponentOk link.

  3. Connect tFileList to tMysqlCommit using an OnComponentOk link.

  4. Connect the tFileList component to the input file component using an Iterate link as the name of the file to be processed will be dynamically filled in from the tFileList directory using a global variable.

  5. Connect the tFileInputDelimited component to the tMap and dispatch the flow between the two output Mysql DB components. Use a Row link for each for these connections representing the main data flow.

Configuring the components

  1. Set the tFileList component properties, such as the directory name where files will be fetched from.

  2. In the tMysqlConnection Component view, set the connection details manually or fetch them from the Repository if you centrally stored them as a Metadata DB connection entry. For more information about Metadata, see Talend Studio User Guide.

  3. On the tFileInputDelimited component's Basic settings panel, press Ctrl+Space bar to access the variable list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH

  4. Set the rest of the fields as usual, defining the row and field separators according to your file structure.

  5. Then set the schema manually through the Edit schema feature or select the schema from the Repository. In Java version, make sure the data type is correctly set, in accordance with the nature of the data processed.

  6. In the tMap Output area, add two output tables, one called mum for the parent table, the second called baby, for the child table.

    Drag the Name column from the Input area, and drop it to the mum table.

    Drag the Years column from the Input area and drop it to the baby table.

  7. Make sure the mum table is on the top of the baby table as the order is determining for the flow sequence hence the DB insert to perform correctly.

    Connect the output row link to distribute correctly the flow to the relevant DB output component.

  8. In each of the tMysqlOutput components' Basic settings panel, select the Use an existing connection check box to retrieve the tMysqlConnection details.

  9. Set the Table name making sure it corresponds to the correct table, in this example either f1090_mum or f1090_baby.

    There is no action on the table as they are already created.

    Select Insert as Action on data for both output components.

    Click on Sync columns to retrieve the schema set in the tMap.

  10. In the Additional columns area of the DB output component corresponding to the child table (f1090_baby), set the id_baby column so that it reuses the id from the parent table.

  11. In the SQL expression field type in: "(Select Last_Insert_id())"

    The position is Before and the Reference column is years.

    In the Advanced settings panel, clear the Extend insert check box.

Executing the Job

  1. Press Ctrl + S to save your Job.

  2. Press F6 to execute it.

    The parent table id has been reused to feed the id_baby column.