How to extract data from specific Excel cells

author
Shicong Hong
EnrichVersion
6.5
EnrichProdName
Talend Big Data Platform
Talend Big Data
Talend Data Integration
Talend Data Services Platform
Talend ESB
Talend Open Studio for Data Integration
Talend Data Fabric
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Talend MDM Platform
Talend Data Management Platform
Talend Open Studio for Big Data
task
Design and Development > Designing Jobs
EnrichPlatform
Talend Studio

How to extract data from specific Excel cells

This article explains how to extract data from specific Excel cells.
In this example:
  • The tFileInputExcel component reads an Excel file and extracts data.
  • The tFixedFlowInput component generates one input flow including the data of the specific cells, using the context variables.
  • The tJavaRow component allows you to enter customized code.
  • The tLogRow component displays data in the Run console.

Dropping and linking the components

Procedure

  1. Create a new Job and drop the following components from the Palette onto the design workspace: tFileInputExcel, tFixedFlowInput, tJavaRow and tLogRow.
  2. Connect tFileInputExcel to tFixedFlowInput using a Trigger > On Subjob Ok connection.
  3. Connect tFileInputExcel to tJavaRow and tFixedFlowInput to tLogRow using Row > Main links.

Configuring the input component - tFileInputExcel

Before you begin

For this example, you must have an Excel file you will use in the Job, as follows.

In this example, you will extract data from cells A4, B1, C9 and D6.

Procedure

  1. Double-click the tFileInputExcel component to open its Basic settings view.
  2. Click the [...] button next to the File Name/Stream field and browse to the source file, in this example example__extract_specific_cells.xlsx.
  3. In Sheet list, click the [+] button and enter the name of the Excel sheet to be processed between double quotation marks, in this example Sheet1.
  4. In the Header field, enter the number of rows to be skipped in the beginning of file, in this example 1.
  5. Click the [...] button next to Edit Schema to define the schema of the input file.

    In this example, the source file holds four columns: lastname, firstname, city and zip.

  6. Click OK to validate your changes and close the dialog box.

Creating context variables

For this example, you will define four variables: a4, b1, c9 and d6. These variables will set the name of the cells the tFixedFlowinput component will extract from the Excel file.

Procedure

  1. Select the Contexts tab view of your Job and click the [+] button to add four variables, a4, b1, c9 and d6.
  2. In the Value field, under the Default context, enter the variable values, as follows.
    Note: The syntax to call a variable is context.VariableName.
  3. Press Ctrl+S to save your changes.

Configuring the tFixedFlowInput component

Procedure

  1. Double-click tFixedFlowInput to display its Basic settings view.
  2. Click the [...] button next to Edit Schema to open the schema editor.
  3. Click the [+] button to add four columns, namely a4, b1, c9 and d6.
  4. Click Ok to close the schema editor and accept propagating the changes when prompted by the system.
  5. In the Mode area, select the Use Single Table option.
    The four defined columns display in the Values panel of the Basic settings view of tFixedFlowInput.
  6. In the Value cell of each column, press Ctrl+Space to access the global variable list.
  7. From the global variable list, select the context variable you created for each column, such as context.a4 for the a4 column.

Confguring the tJavaRow component

Procedure

  1. Double-click tJavaRow to open its Basic settings view.
  2. Click Sync columns to make sure that the schema is correctly retrieved from the preceding component.
  3. In the Code field, enter the following code to display the data extracted from the specific Excel cells.
    int seq=Numeric.sequence("s1",1,1);
    if (seq == 4) {
    context.a4 = input_row.c1;
    }
    if (seq == 1) {
    context.b1 = input_row.c2;
    }
    if (seq == 9) {
    context.c9 = input_row.c3;
    }
    if (seq == 6) {
    context.d6 = input_row.c4;
    }
    Note: In the Code field, input_row corresponds to the links to and from tJavaRow.
  4. Double-click tLogRow to open its Basic settings view and select the Table (print values in cells of a table) option for better display effect.

Executing the job

Procedure

  1. Press Ctrl+S to save your Job.
  2. Click Run or press F6 to run the Job.

    The data extracted from the specific Excel cells is displayed on the console.