Scenario: Handling data with Cassandra - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a simple Job that reads the employee data from a CSV file, writes the data to a Cassandra keyspace, then extracts the personal information of some employees and displays the information on the console.

This scenario requires six components, which are:

  • tCassandraConnection: opens a connection to the Cassandra server.

  • tFileInputDelimited: reads the input file, defines the data structure and sends it to the next component.

  • tCassandraOutput: writes the data it receives from the preceding component into a Cassandra keyspace.

  • tCassandraInput: reads the data from the Cassandra keyspace.

  • tLogRow: displays the data it receives from the preceding component on the console.

  • tCassandraClose: closes the connection to the Cassandra server.

Dropping and linking the components

  1. Drop the following components from the Palette onto the design workspace: tCassandraConnection, tFileInputDelimited, tCassandraOutput, tCassandraInput, tLogRow and tCassandraClose.

  2. Connect tFileInputDelimited to tCassandraOutput using a Row > Main link.

  3. Do the same to connect tCassandraInput to tLogRow.

  4. Connect tCassandraConnection to tFileInputDelimited using a Trigger > OnSubjobOk link.

  5. Do the same to connect tFileInputDelimited to tCassandraInput and tCassandraInput to tCassandraClose.

  6. Label the components to better identify their functions.

Configuring the components

Opening a Cassandra connection

  1. Double-click the tCassandraConnection component to open its Basic settings view in theComponent tab.

  2. Select the Cassandra version that you are using from the DB Version list. In this example, it is Cassandra 1.1.2.

  3. In the Server field, type in the hostname or IP address of the Cassandra server. In this example, it is localhost.

  4. In the Port field, type in the listening port number of the Cassandra server.

  5. If required, type in the authentication information for the Cassandra connection: Username and Password.

Reading the input data

  1. Double-click the tFileInputDelimited component to open its Component view.

  2. Click the [...] button next to the File Name/Stream field to browse to the file that you want to read data from. In this scenario, the directory is D:/Input/Employees.csv. The CSV file contains four columns: id, age, name and ManagerID.

    id;age;name;ManagerID
    1;20;Alex;1
    2;40;Peter;1
    3;25;Mark;1
    4;26;Michael;1
    5;30;Christophe;2
    6;26;Stephane;3
    7;37;Cedric;3
    8;52;Bill;4
    9;43;Jack;2
    10;28;Andrews;4
  3. In the Header field, enter 1 so that the first row in the CSV file will be skipped.

  4. Click Edit schema to define the data to pass on to the tCassandraOutput component.

Writing data to a Cassandra keyspace

  1. Double-click the tCassandraOutput component to open its Basic settings view in the Component tab.

  2. Type in required information for the connection or use the existing connection you have configured before. In this scenario, the Use existing connection check box is selected.

  3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example, and select Drop keyspace if exists and create from the Action on keyspace list.

  4. In the Column family configuration area, type in the name of the column family: Employee_Info in this example, and select Drop column family if exists and create from the Action on column family list.

    The Define column family structure check box appears. In this example, clear this check box.

  5. In the Action on data list, select the action you want to carry on, Upsert in this example.

  6. Click Sync columns to retrieve the schema from the preceding component.

  7. Select the key column of the column family from the Key column list. In this example, it is id.

    If needed, select the Include key in columns check box.

Reading data from the Cassandra keyspace

  1. Double-click the tCassandraInput component to open its Component view.

  2. Type in required information for the connection or use the existing connection you have configured before. In this scenario, the Use existing connection check box is selected.

  3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example.

  4. In the Column family configuration area, type in the name of the column family: Employee_Info in this example.

  5. Select Edit schema to define the data structure to be read from the Cassandra keyspace. In this example, three columns id, name and age are defined.

  6. If needed, select the Include key in output columns check box, and then select the key column of the column family you want to include from the Key column list.

  7. From the Row key type list, select Integer because id is of integer type in this example.

    Keep the Default option for the row key Cassandra type because its value will become the corresponding Cassandra type Int32 automatically.

  8. In the Query configuration area, select the Specify row keys check box and specify the row keys directly. In this example, three rows will be read. Next, select the Specify columns check box and specify the column names of the column family directly. This scenario will read three columns from the keyspace: id, name and age.

  9. If needed, the Key start and the Key end fields allow you to define the range of rows, and the Key limit field allows you to specify the number of rows within the range of rows to be read. Similarly, the Columns range start and the Columns range end fields allow you to define the range of columns of the column family, and the Columns range limit field allows you to specify the number of columns within the range of columns to be read.

Displaying the information of interest

  1. Double-click the tLogRow component to open its Component view.

  2. In the Mode area, select Table (print values in cells of a table).

Closing the Cassandra connection

  1. Double-click the tCassandraClose component to open its Component view.

  2. Select the connection to be closed from the Component List.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Execute the Job by pressing F6 or clicking Run on the Run tab.

    The personal information of three employees is displayed on the console.