Translating the scenario into a Job - 6.1

Talend Open Studio for Big Data User Guide

EnrichVersion
6.1
EnrichProdName
Talend Open Studio for Big Data
task
Design and Development
EnrichPlatform
Talend Studio

In order to implement this scenario, break down the Job into four steps:

  1. Create the Job, define the schema for the input data, and read the input file according to the defined schema.

  2. Set the command to enable the output stream feature.

  3. Map the data using the tMap component.

  4. Output the selected data stream.

A complete Job looks as what it displays in the following image. For the detailed instruction for designing the Job, read the following sections.

Step 1: Reading input data from a local file

We will use the tFileInputDelimited component to read the file customers.csv for the input data. This component can be found in the File/Input group of the Palette.

  1. Drop a tFileInputDelimited component onto the design workspace, and double-click the to open the Basic settings view to set its properties.

  2. Click the three-dot button next to the File name/Stream field to browse to the path of the input data file. You can also type in the path of the input data file manually.

  3. Click Edit schema to open a dialog box to configure the file structure of the input file.

  4. Click the plus button to add six columns and set the Type and columns names to what we listed in the following:

  5. Click OK to close the dialog box.

Step2: Setting the command to enable the output stream feature

Now we will make use of tJava to set the command for creating an output file and a directory that contains the output file.

To do so:

  1. Drop a tJava component onto the design workspace, and double-click it to open the Basic settings view to set its properties.

  2. Fill in the Code area with the following command:

    new java.io.File("C:/myFolder").mkdirs(); 
    globalMap.put("out_file",new java.io.FileOutputStream("C:/myFolder/customerselection.txt",false));

    Note

    The command we typed in this step will create a new directory C:/myFolder for saving the output file customerselection.txt which is defined followingly. You can customize the command in accordance with actual practice.

  3. Connect tJava to tFileInputDelimited using a Trigger > On Subjob Ok connection. This will trigger tJava when subjob that starts with tFileInputDelimited succeeds in running.

Step3: Mapping the data using the tMap component

  1. Drop a tMap component onto the design workspace, and double-click it to open the Basic settings view to set its properties.

  2. Click the three-dot button next to Map Editor to open a dialog box to set the mapping.

  3. Click the plus button on the left to add six columns for the schema of the incoming data, these columns should be the same as the following:

  4. Click the plus button on the right to add a schema of the outgoing data flow.

  5. Select New output and Click OK to save the output schema. For the time being, the output schema is still empty.

  6. Click the plus button beneath the out1 table to add three columns for the output data.

  7. Drop the id, CustomerName and CustomerAge columns onto their respective line on the right.

  8. Click OK to save the settings.

Step4: Outputing the selected data stream

  1. Drop a tFileOutputDelimited component onto the design workspace, and double-click it to open the Basic settings view to set its component properties.

  2. Select the Use Output Stream check box to enable the Output Stream field and fill the Output Stream field with the following command:

    (java.io.OutputStream)globalMap.get("out_file")

    Note

    You can customize the command in the Output Stream field by pressing CTRL+SPACE to select built-in command from the list or type in the command into the field manually in accordance with actual practice. In this scenario, the command we use in the Output Stream field will call the java.io.OutputStream class to output the filtered data stream to a local file which is defined in the Code area of tJava in this scenario.

  3. Connect tFileInputDelimited to tMap using a Row > Main connection and connect tMap to tFileOutputDelimited using a Row > out1 connection which is defined in the Map Editor of tMap.

  4. Click Sync columns to retrieve the schema defined in the preceding component.

To output the selected data to the console:

  1. Drop a tLogRow component onto the design workspace, and double-click it to open its Basic settings view.

  2. Select the Table radio button in the Mode area.

  3. Connect tFileOutputDelimited to tLogRow using a Row > Main connection.

  4. Click Sync columns to retrieve the schema defined in the preceding component.

    This Job is now ready to be executed.

  5. Press CTRL+S to save your Job and press F6 to execute it.

    The content of the selected data is displayed on the console.

    The selected data is also output to the specified local file customerselection.txt.

For an example of Job using this feature, see Scenario: Utilizing Output Stream in saving filtered data to a local file of tFileOutputDelimited in Talend Open Studio for Big Data Components Reference Guide.

For the principle of the Use Output Stream feature, see How to use the Use Output Stream feature.