Scenario 2: Importing data from a CSV file to Neo4j using a Cypher query - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that first imports employees data from a CSV file into a Neo4j database using a Cypher query, and then displays the information on the console.

Adding and linking components

  1. Create a Job and add the following components to the Job by typing theirs names in the design workspace or dropping them from the Palette:

    • a tNeo4jConnection component, to open a connection to a Neo4j database,

    • a tFileInputDelimited component, to read the source data from a CSV file,

    • a tNeo4jRow component, to write the employees data to the Neo4j database with a Cypher query,

    • a tNeo4jIntput component, to read the employees data from the Neo4j database,

    • a tLogRow component, to display the data on the Run console, and

    • a tNeo4jClose component, to close the Neo4j database connection opened by the tNeo4jConnection component.

  2. Link the tNeo4jConnection component to the tFileInputDelimited component using a Trigger > On Subjob Ok connection.

  3. Link the tFileInputDelimited component to the tNeo4jRow component using a Row > Main connection.

  4. Link the tFileInputDelimited component to the tNeo4jIntput component using a Trigger > On Subjob Ok connection.

  5. Link the tNeo4jIntput component to the tLogRow component using a Row > Main connection.

  6. Link the tNeo4jIntput component to the tNeo4jClose component using a Trigger > On Subjob Ok connection.

  7. Label the components to better identify their roles in the Job.

Configuring the components

Configuring a Neo4j database connection

  1. Double-click the tNeo4jConnection component to open its Basic settings view on the Component tab.

  2. From the DB Version list, select Neo4J 2.X.X.

  3. Select the Use a remote server check box and specify the URL of the Neo4j server in the Server URL field, "http://localhost:7474/db/data" in this example.

    In this example, you will use Neo4j in REST mode; to connect to a remote Neo4j server in embedded mode, clear the Use a remote server check box and specify the Neo4j data file directory in the Database path field.

Configuring data import

  1. Double-click the tFileInputDelimited component to open its Basic settings view on the Component tab.

  2. In the File name/Stream field, specify the path to the CSV file that contains the employees data to read.

    The input CSV file used in this example is as follows:

    employeeID;employeeName;age;hireDate;salary;managerID
    1;Rutherford Roosevelt;38;06-10-2008;13336.58;m5
    2;Warren Adams;43;05-22-2008;11626.68;m6
    3;Andrew Roosevelt;55;04-01-2007;10052.95;m4
    4;Herbert Quincy;54;06-14-2007;10694.71;m6
    5;Woodrow Polk;33;08-14-2007;13751.50;m4
    6;Theodore Johnson;47;01-26-2008;12426.87;m6
    7;Benjamin Adams;32;02-25-2008;10438.65;m4
    8;Woodrow Harrison;51;10-11-2008;11188.27;m5
    9;George Truman;40;04-28-2008;14254.49;m5
    10;Harry Jackson;38;04-01-2008;12798.78;m6
  3. In the Header field, specify the number of rows to skip as header rows. In this example, the first row of the CSV file is the header row.

  4. Click the [...] button next to Edit schema to open the [Schema] dialog box, and define the input schema based on the structure of the input file. In this example, the input schema is composed of six columns: employeeID (integer), employeeName (String), age (Integer), hireDate (Date), salary (Double), and managerID (String).

    When done, click OK to close the [Schema] dialog box and propagate the schema to the next component.

  5. Double-click the tNeo4jRow component to open its Basic settings view on the Component tab.

  6. Select the Use an existing connection check box to reuse the Neo4j database connection opened by the tNeo4jConnection component, which is the only connection component used in this example.

  7. In the Query field, type in the Cypher query to be executed by the component.

    In this example, type in the following query to create nodes with the label Employees and six properties, to hold the data from the input flow:

    • ID, which will take the value of the variable parameter id,

    • Name, which will take the value of the variable parameter name,

    • Age, which will take the value of the variable parameter age,

    • HireDate, which will the value of the variable parameter hire_date,

    • Salary, which will take the value of the variable parameter salary, and

    • ManagerID, which will the value of the variable parameter manager_id.

    "CREATE (n:Employees{ID:{id}, Name:{name}, Age:{age}, HireDate:{hire_date}, Salary:{salary}, ManagerID:{manager_id}})"
  8. In the Parameters table, type in the variable parameters in the Parameter field in accordance with your Cypher query , and map each of them with an input schema column by selecting it from the Parameter value list field.

Configuring data retrieving and display

  1. Double-click the tNeo4jInput component to open its Basic settings view.

  2. Select the Use an existing connection check box to reuse the connection opened the tNeo4jConnection component.

  3. Click the [...] button next to Edit schema and define the schema corresponding to the node properties you want to retrieve and display.

    When done, click OK to close the [Schema] dialog box and propagate the schema to the next component.

    The defined schema columns automatically appear in the Mapping table.

  4. In the Query field, type in the Cypher query to match the data to read from the Neo4j database. In this example, use the following Cypher query to retrieve all the properties of all the nodes with the label Employees.

    "MATCH (n:Employees) RETURN *;"
  5. Fill the Return parameter field for each schema column with a return parameter in double quotes to map the node properties in the Neo4j database with the schema columns.

  6. Double-click the tLogRow component to open its Basic settings view, and select the Table (print values in cells of a table) option to display the retrieved information in a table.

Executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 or click Run on the Run tab to run the Job.