Sorting a file in Talend Studio - 7.3

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs

Sorting a file in Talend Studio

In this tutorial, learn how to use a tSortRow component to sort data from a file with Talend Studio.

This tutorial makes use of a .csv file. If you do not have a .csv file, click the Downloads tab and save customers_unordered.csv.

Creating a Talend Studio project

Creating a project is the first step to using Talend Studio. Projects allow you to better organize your work.

Procedure

  1. Select Create a new project.
  2. Enter a name for your project.

    Example

    TalendDemo
  3. Click Create.
  4. Click Finish.

Results

Your project opens. You are ready to work in Talend Studio.

Creating a Job to sort a delimited file

Talend Studio projects contain Jobs. In Jobs, you can build workflows through components, which allow you to complete specific actions.

Before you begin

Select the Integration perspective (Window > Perspective > Integration).

Procedure

  1. In Repository, right-click Job Designs.
    1. Click Create Standard Job.
  2. In the Name field, enter a name.

    Example

    SortCSVfile
  3. Optional: In the Purpose field, enter a purpose.

    Example

    Sort a .csv file
  4. Optional: In the Description field, enter a description.

    Example

    Sort a .csv file according to a defined column
    Tip: Enter a Purpose and Description to stay organized.
  5. Click Finish.

Results

The Designer opens an empty Job.

Configuring a component to read a delimited file

Talend Studio components allow you to complete specific actions. You can add them to Jobs. You can use the tFileInputDelimited component to read a delimited file, for example.

Before you begin

This tutorial makes use of a .csv file. If you do not have a .csv file, click the Downloads tab and save customers_unordered.csv.

Procedure

  1. Click inside the Designer.
  2. Enter tFileInputDelimited and select the component of the same name.
  3. In the Designer, double-click the tFileInputDelimited component.
    1. Click the […] button next to the File Name/Stream field.
    2. Select the file of your choice in the File Explorer.
    3. Optional: Check your file's Field Separator and change it, if needed.
      Note: The most common Field Separator is ;

Results

You have added a tFileInputDelimited component and selected a file to be read.

Defining a component schema to read a delimited file

Defining the component schema of your delimited file helps you parse the data you are working with.

Before you begin

You must have added and configured a tFileInputDelimited component (see Configuring a component to read a delimited file).

Procedure

  1. In the Designer, double-click the tFileInputDelimited component.
  2. Click the […] button next to Edit schema.
    The Schema wizard opens.
  3. Click the plus button to add a Column.
    1. Add as many columns as there are headers in your .csv file.
      Note: Headers are the first values in a .csv file.
    2. Enter the name of each Column.
      Column names must be identical to header names.

      Example

      • First
      • Last
      • Number
      • Street
      • City
      • State
    3. Select the Type of each Column.
      Tip: Select the String Type for a postcode. A postcode number does not serve an arithmetic function.
  4. Click OK.

Results

You have defined the schema of your file.

Sorting your data

Through the tSortRow component, Talend Studio allows you to sort your data.

Sorting a delimited file

You can sort a file delimited file with a link to a tSortRow component. The tSortRow component sorts input data based on one or several columns of data, by sort type or order, for example.

Before you begin

Procedure

  1. In the Designer, add a tSortRow component.
  2. Right-click the tFileInputDelimited component.
    1. Select Row > Main
    2. Click on the tSortRow component to link the two.
  3. Double-click the tSortRow component.
  4. Click the […] button next to Edit schema.
    Because they are linked, the tSortRow component inherits the schema of the tFileInputDelimited component.
  5. Click the plus button to add a sorting rule.

    Example

    1. In Schema column, select City.
    2. In sort num or alpha?, select alpha.
  6. Optional: Click the plus button to add another rule.

    Example

    1. In Schema column, select Street.
    2. In sort num or alpha?, select alpha

Results

You have configured the file delimited data so that it is sorted.

Displaying the results of sorting a delimited file

You can display the result of a workflow with a link to a tLogRow component. The tLogRow component displays data in the Run console.

Before you begin

Procedure

  1. In the Designer, add a tLogRow component.
  2. Right-click the tSortRow component.
    1. Select Row > Main
    2. Click on the tLogRow component to link the two.