Scenario: Creating a line chart to ease trend analysis - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that reads data from a CSV file and transforms the data into a line chart to facilitate trend analysis. The input file records how long (in minutes) per week a person watches different TV channels over ten weeks, as shown below:

Week;TV_A;TV_B;TV_C
1;327;286;244
2;326;285;243
3;325;283;245
4;323;282;246
5;322;285;248
6;321;288;247
7;322;291;245
8;321;292;244
9;320;293;243
10;319;294;242

Because the input file has a different structure than required by the tLineChart component, this use case uses the tMap component to adapt the source data to the three-column schema of tLineChart so that a temporary CSV file can be created as the input to the tLineChart component.

Note

You will usually use the tMap component to adjust the input schema in accordance with the schema structure of the tLineChart component. For more information about how to use the tMap component, see Talend Studio User Guide and tMap.

To ensure correct generation of the temporary input file, a pre-treatment subjob is used to delete the temporary file in case it already exists before the main Job is executed; as this temporary file serves this specific Job only, a post-treatment subjob is used to deleted it after the main Job is executed.

Dropping and linking components

  1. Drop the following components from the Palette to the design workspace: two tFileDelete components, two tFileInputDelimited components, a tMap, three tFileOutputDelimited components, and a tLineChart.

  2. Connect the first tFileInputDelimited to the tMap component using a Row > Main connection.

  3. Connect the tMap component to the first tFileOutputDelimited component using a Row > Main connection, and name the connection TV_A.

  4. Repeat the step above to connect the tMap component to the other two tFileOutputDelimited components using Row > Main connections, and name the connections TV_B and TV_C respectively.

  5. Connect the section tFileInputDelimited to the tLineChart component using a Row > Main connection. When questioned whether to get the schema from the target component, click Yes.

  6. Connect the first tFileInputDelimited component to the second tFileInputDelimited component using a Trigger > On Subjob Ok connection.

  7. Connect the first tFileDelete component to the first tFileInputDelimited component, and then the second tFileInputDelimited component to the second tFileDelete component, using Trigger > On Subjob Ok connections.

  8. Relabel the components to best describe their functionality.

Reading the source data

  1. Double-click the first tFileInputDelimited component, which is labelled Source_data, to display its Basic settings view.

  2. Fill in the File name field by browsing to the input file.

  3. Specify the header row. In this use case, the first row of the input file is the header row. And leave the other parameters as they are.

  4. Click Edit schema to describe the data structure of the input file. In this use case, the input schema is made of four columns: Week, Mins_TVA, Mins_TVB, and Mins_TVC. Upon defining the column names and data type, click OK to close the schema dialog box.

Adapting the source data to the tLineChart schema

  1. Double-click the tMap to open the Map Editor.

    You can see an input table on the input panel, row1 in this example, and three empty output tables, named TV_A, TV_B, and TV_C on the output panel.

  2. Use the Schema editor to add three columns to each output table: series (string), x (integer), and y (integer).

  3. In the relevant Expression field of the output tables, enter the text to be presented in the legend area of the line chart, TV A, TV B, and TV C respectively in this example.

  4. Drop the Week column of the input table onto the x column of each output table.

  5. Drop the Mins_TVA column of the input table onto the y column of the TV_A table.

  6. Drop the Mins_TVB column of the input table onto the y column of the TV_B table.

  7. Drop the Mins_TVC column of the input table onto the y column of the TV_C table.

  8. Click OK to save the mappings and close the Map Editor and propagate the output schemas to the output components.

Generating the temporary input file

  1. Double-click the first tFileOutputDelimited component to display its Basic settings view.

  2. In the File Name field, define a temporary CSV file to send the mapped data flows to. In this use case, we name this file Temp.csv. This file will be used as the input to the tLineChart component.

  3. Select the Append check box.

  4. Repeat the steps above to define the properties of the other two tFileOutputDelimited components, using exactly the same settings as in the first tFileOutputDelimited component.

    Note

    Note that the order of output flows from the tMap component is not necessarily the actual order of writing data to the target file. To ensure the target file is correctly generated, we need to delete the file by the same name if it already exists before Job execution and select the Append check box in all the tFileOutputDelimited components in this step.

Configuring line chart generation

  1. Double-click the second tFileInputDelimited component, which is labelled Temp_Input, to display its Basic settings view.

  2. Fill in the File name field with the path to the temporary input file generated by the tFileOutputDelimited components. In this use case, the temporary input file to the tLineChart is Temp.csv.

  3. Double-click the tLineChart component to display its Basic settings view.

  4. Click Edit schema to open the schema dialog box.

  5. Check that the input and output schemas are synchronized. If needed, copy all the columns from the output schema to the input schema by clicking the left-pointing double arrow button. Then, click OK to close the schema dialog box.

  6. In the Generated image path field, define the path of the image file to be generated.

  7. In the Chart title field, define a title for the line chart. In this use case, enter Average Weekly Viewing (per person) as the chart title.

  8. Define the domain (X) and range (Y) axis labels. In this use case, enter Week and Minutes respectively the axis labels.

  9. Define the image size, the moving average period, the lower and upper bounds, the chart background color, and the background color of the plot area, as you prefer.

    In this use case, we set the image size to 450 by 450, set the lower and upper bounds to 210 and 340 respectively, select light gray as the chart background color, and keep the rest settings are they are.

Deleting the temporary file

  1. Double-click the first tFileDelete component to display its Basic settings view.

  2. Fill in the File name field with the path to the temporary input file, and clear the Fail on error check box to allow the main Job to be executed if the file to delete does not exist.

  3. Specify the same file path in the other tFileDelete component.

Executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6 to launch the Job.

    A line chart is generated as defined, showing a graphical comparison of the average weekly viewing time and the viewing trends of different TV channels over the past ten weeks.