Scenario: Importing data into an EXASolution database table from a local CSV file - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that writes employee information into a CSV file, then loads the data from this local file into a newly created EXASolution database table using the tEXABulkExec component, and finally retrieves the data from the table and displays it on the console.

Dropping and linking the components

  1. Create a new Job and add the following components by typing their names in the design workspace or dropping them from the Palette: a tFixedFlowInput component, a tFileOutputDelimited component, a tEXABulkExec component, a tEXAInput component, and a tLogRow component.

  2. Connect the tFixedFlowInput component to the tFileOutputDelimited component using a Row > Main connection.

  3. Do the same to connect the tEXAInput component to the tLogRow component.

  4. Connect the tFixedFlowInput component to the tEXABulkExec component using a Trigger > On Subjob Ok connection.

  5. Do the same to connect the tEXABulkExec component to the tEXAInput component.

Configuring the components

Preparing the source data

  1. Double-click the tFixedFlowInput component to open its Basic settings view.

  2. Click the [...] button next to Edit schema to open the [Schema] dialog box.

  3. Click the [+] button to add six columns: EmployeeID of the Integer type, EmployeeName, OrgTeam and JobTitle of the String type, OnboardDate of the Data type with the yyyy-MM-dd date pattern, and MonthSalary of the Double type.

  4. Click OK to close the dialog box and accept schema propagation to the next component.

  5. In the Mode area, select Use Inline Content (delimited file) and enter the following employee data in the Content field.

    12000;James;Dev Team;Developer;2008-01-01;15000.01
    12001;Jimmy;Dev Team;Developer;2008-11-22;13000.11
    12002;Herbert;QA Team;Tester;2008-05-12;12000.22
    12003;Harry;Doc Team;Technical Writer;2009-03-10;12000.33
    12004;Ronald;QA Team;Tester;2009-06-20;12500.44
    12005;Mike;Dev Team;Developer;2009-10-15;14000.55
    12006;Jack;QA Team;Tester;2009-03-25;13500.66
    12007;Thomas;Dev Team;Developer;2010-02-20;16000.77
    12008;Michael;Dev Team;Developer;2010-07-15;14000.88
    12009;Peter;Doc Team;Technical Writer;2011-02-10;12500.99
  6. Double-click the tFileOutputDelimited component to open its Basic settings view.

  7. In the File Name field, specify the file into which the input data will be written. In this example, it is "E:/employee.csv".

  8. Click Advanced settings to open the Advanced settings view of the tFileOutputDelimited component.

  9. Select the Advanced separator (for numbers) check box and in the Thousands separator and Decimal separator fields displayed, specify the separators for thousands and decimal. In this example, the default values "," and "." are used.

Loading the source data into a newly created EXASolution database table

  1. Double-click the tEXABulkExec component to open its Basic settings view.

  2. Fill in the Host, Port, Schema, User and Password fields with your EXASolution database connection details.

  3. In the Table field, enter the name of the table into which the source data will be written. In this example, the target database table is named "employee" and it does not exist.

  4. Select Create table from the Action on table list to create the specified table.

  5. In the Source area, select Local file as the source for the input data, and then specify the file that contains the source data. In this example, it is "E:/employee.csv".

  6. Click the [...] button next to Edit schema to open the [Schema] dialog box and define the schema, which should be the same as that of the tFixedFlowInput component.

    Then click OK to validate these changes and close the dialog box.

  7. Click Advanced settings to open the Advanced settings view of the tEXABulkExec component.

  8. In the Column Formats table, for the two numeric fields EmployeeID and MonthSalary, select the corresponding check boxes in the Has Thousand Delimiters column, and then define their format model strings in the corresponding fields of the Alternative Format column. In this example, "99G999" for EmployeeID and "99G999D99" for MonthSalary.

  9. Make sure that the Thousands Separator and Decimal Separator fields have values identical to those of the tFileOutputDelimited component and keep the default settings for the other options.

Retrieving data from the EXASolution database table

  1. Double-click the tEXAInput component to open its Basic settings view.

  2. Fill in the Host name, Port, Schema name, Username and Password fields with your EXASolution database connection details.

  3. In the Table Name field, enter the name of the table from which the data will be retrieved. In this example, it is "employee".

  4. Click the [...] button next to Edit schema to open the [Schema] dialog box and define the schema, which should be the same as that of the tFixedFlowInput component.

    Then click OK to close the dialog box and accept schema propagation to the next component.

  5. Click the Guess Query button to fill the Query field with the following auto-generated SQL statement that will be executed on the specified table.

    SELECT employee.EmployeeID,
    		employee.EmployeeName,
    		employee.OrgTeam,
    		employee.JobTitle,
    		employee.OnboardDate,
    		employee.MonthSalary
    FROM	employee
  6. Double-click the tLogRow component to open its Basic settings view.

  7. In the Mode area, select the Table (print values in cells of a table) option for better readability of the output.

Saving and executing the Job

  1. Press Ctrl + S to save the Job.

  2. Press F6 to execute the Job.