Scenario 2: Using PreparedStatement objects to query data - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a four component job which allows you to link a table column with a client file. The MySQL table contains a list of all the American States along with the State ID, while the file contains the customer information including the ID of the State in which they live. We want to retrieve the name of the State for each client, using an SQL query. In order to process a large volume of data quickly, we use a PreparedStatement object which means that the query is executed only once rather than against each row in turn. Then each row is sent as a parameter. Note that PreparedStatement object can also be used in avoiding SQL injection.

For this scenario, we use a file and a database for which we have already stored the connection and properties in the Rerpository metadata. For further information concerning the creation of metadata in delimited files, the creation of database connection metadata and the usage of metadata, see Talend Studio User Guide.

Linking the components

  1. In the Repository, expand the Metadata and File delimited nodes.

  2. Select the metadata which corresponds to the client file and slide the metadata onto the workspace. Here, we are using the customers metadata.

  3. Double-click tFileInputDelimited in the [Components] dialog box to add tFileInputDelimited to the workspace, with the relevant fields filled by the metadata file.

  4. Drop tMysqlRow, tParseRecordSet and tFileOutputDelimited onto the workspace.

  5. Link tFileInputDelimited to tMysqlRow using a Row > Main connection.

  6. Link tMysqlRow to tParseRecordSet using a Row > Main connection.

  7. Link tParseRecordSet to tFileOutputDelimited using a Row > Main connection.

Configuring the components

  1. Double-click tFileInputDelimited to open its Basic settings view.

  2. In the Schema list, select Built-in so that you can modify the component's schema. Then click on [...] next to the Edit schema field to add a column into which the name of the State will be inserted.

  3. Click on the [+] button to add a column to the schema. Rename this column LabelStateRecordSet and select Object from the Type list. Click OK to save your modifications.

    From the Palette, select the tMysqlRow, tParseRecordSet and tFileOutputDelimited components and drop them onto the workspace.

  4. Double click tMysqlRow to set its properties in the Basic settings tab of the Component view.

  5. In the Property Type list, select Repository and click on the [...] button to select a database connection from the metadata in the Repository. The DB Version, Host, Port, Database, Username and Password fields are completed automatically. If you are using the Built-in mode, complete these fields manually.

  6. From the Schema list, select Built-in to set the schema properties manually and add the LabelStateRecordSet column, or click directly on the Sync columns button to retrieve the schemma from the preceding component.

  7. In the Query field, enter the SQL query you want to use. Here, we want to retrieve the names of the American States from the LabelState column of the MySQL table, us_state: "SELECT LabelState FROM us_state WHERE idState=?".

    The question mark, "?", represents the parameter to be set in the Advanced settings tab.

  8. Click Advanced settings to set the components advanced properties.

  9. Select the Propagate QUERY's recordset check box and select the LabelStateRecordSet column from the use column list to insert the query results in that column.

    Select the Use PreparedStatement check box and define the parameter used in the query in the Set PreparedStatement Parameters table.

    Click on the [+] button to add a parameter.

    In the Parameter Index cell, enter the parameter position in the SQL instruction. Enter "1" as we are only using one parameter in this example.

    In the Parameter Type cell, enter the type of parameter. Here, the parameter is a whole number, hence, select Int from the list.

    In the Parameter Value cell, enter the parameter value. Here, we want to retrieve the name of the State based on the State ID for every client in the input file. Hence, enter "row1.idState".

  10. Double click tParseRecordSet to set its properties in the Basic settings tab of the Component view.

  11. From the Prev. Comp. Column list, select the preceding components column for analysis. In this example, select LabelStateRecordSet.

    Click on the Sync columns button to retrieve the schema from the preceding component. The Attribute table is automatically completed with the schema columns.

    In the Attribute table, in the Value field which corresponds to the LabelStateRecordSet, enter the name of the column containing the State names to be retrieved and matched with each client, within double quotation marks. In this example, enter "LabelState".

  12. Double click tFileOutputDelimited to set its properties in the Basic settings tab of the Component view.

  13. In the File Name field, enter the access path and name of the output file.

    Click Sync columns to retrieve the schema from the preceding component.

Executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to run it.

    A column containing the name of the American State corrresponding to each client is added to the file.