Scenario 1: Writing data to a Neo4j database and reading specific data from it - 6.1

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This basic scenario describes a Job composed of two subjobs: the first subjob reads employees data from a CSV file and writes it to a Neo4j database, and then triggers the second subjob, which reads the employees data based on certain query conditions from the Neo4j database and displays the data on the Run console.

Adding and linking components

  1. Create a Job and add the following components to the Job by typing theirs names in the design workspace or dropping them from the Palette:

    • a tFileInputDelimited component, to read the employees data from a CSV file,

    • a tNeo4jOutput component to write the employees data to a Neo4j database,

    • a tNeo4jIntput component to read the employees data from the Neo4j database based on given conditions, and

    • a tLogRow component to display the data on the Run console.

  2. Link the tFileInputDelimited component to the tNeo4jOutput component using a Row > Main connection.

  3. Link the tNeo4jIntput component to the tLogRow component using a Row > Main connection.

  4. Link the tFileInputDelimited component to the tNeo4jIntput component using a Trigger > On Subjob Ok connection.

  5. Label the components to better identify their roles in the Job.

Configuring the components

Importing data to the Neo4j database

  1. Double-click the tFileInputDelimited component to open its Basic settings view on the Components tab.

  2. In the File name/Stream field, specify the path to the CSV file that contains the employees data to read.

    The input CSV file used in this example is as follows:

    employeeID;employeeName;age;hireDate;salary;managerID
    1;Rutherford Roosevelt;38;06-10-2008;13336.58;m5
    2;Warren Adams;43;05-22-2008;11626.68;m6
    3;Andrew Roosevelt;55;04-01-2007;10052.95;m4
    4;Herbert Quincy;54;06-14-2007;10694.71;m6
    5;Woodrow Polk;33;08-14-2007;13751.50;m4
    6;Theodore Johnson;47;01-26-2008;12426.87;m6
    7;Benjamin Adams;32;02-25-2008;10438.65;m4
    8;Woodrow Harrison;51;10-11-2008;11188.27;m5
    9;George Truman;40;04-28-2008;14254.49;m5
    10;Harry Jackson;38;04-01-2008;12798.78;m6
  3. In the Header field, specify the number of rows to skip as header rows. In this example, the first row of the CSV file is the header row.

  4. Click the [...] button next to Edit schema to open the [Schema] dialog box, and define the input schema based on the structure of the input file. In this example, the input schema is composed of six columns: employeeID (integer), employeeName (String), age (Integer), hireDate (Date), salary (Float), and managerID (String).

    When done, click OK to close the [Schema] dialog box and propagate the schema to the next component.

  5. Click the tNeo4jOutput component and select the Component tab to open its Basic settings view.

  6. Define a Neo4j database connection. In this example, the Neo4j database is accessible in REST mode, so select the Remote server check box and specify the URL of the Neo4j server in the Server URL field, "http://localhost:7474/db/data" in this example.

  7. If needed, click the Sync columns button to ensure the component has the same schema as the preceding component.

    Keep the rest of the parameters as they are.

Reading data from the Neo4j database

  1. Double-click the tNeo4jInput component to open its Basic settings view.

  2. As in the tNeo4jOutput component, specify the URL of the Neo4j server to connect to, "http://localhost:7474/db/data" in this example.

  3. Click the [...] button next to Edit schema and define the schema for employees information display. When done, click OK to close the [Schema] dialog box and propagate the schema to the next component.

    The defined schema columns automatically appear in the Mapping table.

  4. In the Query field, type in the Cypher query to match the data to read from the Neo4j database. In this example, use the following Cypher query to find employees who are more than 40 years old and are under the manager m6.

    "MATCH (n) WHERE n.age > 40 AND n.managerID = 'm6' RETURN n;"
  5. Fill the Return parameter field for each schema column with a return parameter in double quotes to map the node properties in the Neo4j database with the schema columns.

  6. Double-click the tLogRow component to open its Basic settings view, and select the Table (print values in cells of a table) option to display the retrieved information in a table.

Executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 or click Run on the Run tab to run the Job.

    The employees data of the CSV file is written to the Neo4j database and then the information of employees matching the set conditions is retrieved from the Neo4j database and displayed on the console.