Scenario 3: Importing data from a CSV file to Neo4j and creating relationships using a single Cypher query - 6.1

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that imports family information from a CSV file into a remote Neo4j database and create relationships between persons and families using a single Cypher query through a tNeo4jRow component.

Below is content of the CSV file to import data from in this example:

Name;Gender;Age;Family
Jenny;Female;24;the Johnsons
Jack;Male;26;the Johnsons
Richard;Male;35;the Blacks
Anne;Female;36;the Whites
Helen;Female;28;the Blacks
Tom;Male;38;the Whites

As MERGE is used with LOAD CSV in this example, to ensure the Cypher query is executed in an efficient way, another tNeo4jRow component is used to create an index on the property to merge.

Adding and linking components

  1. Create a Job and add two tNeo4jRow components to the Job by typing the component name in the design workspace or dropping them from the Palette.

  2. Link the components using a Trigger > On Sub Job Ok connection.

  3. Label the components to better identify their roles in the Job.

Configuring the components

Creating an index

  1. Double-click the first tNeo4jRow component to open its Basic settings view on the Component tab.

  2. From the DB Version list, select Neo4J 2.X.X.

  3. Select the Remote server check box and specify the URL of the Neo4j server in the Server URL field, "http://localhost:7474/db/data" in this example.

  4. In the Query field, type in the following query to create an index on the property you are going to merge, which is the name property of the Family nodes in this example:

    "CREATE INDEX ON :Family(name)"

Importing data and creating relationships

  1. Double-click the second tNeo4jRow component to open its Basic settings view on the Component tab.

  2. From the DB Version list, select Neo4J 2.X.X.

  3. Select the Remote server check box and specify the URL of the Neo4j server in the Server URL field, "http://localhost:7474/db/data" in this example.

  4. In the Query field, type in the following Cypher query to import family data from the CSV file, create relevant Person and Family nodes, and create relationships between persons and families:

    "LOAD CSV WITH HEADERS FROM 'file:E:/Talend/Data/Input/families.csv' AS csvLine FIELDTERMINATOR ';' 
    MERGE (family:Family { name: csvLine.Family })
    CREATE (person:Person { name: csvLine.Name, gender: csvLine.Gender, age: toInt(csvLine.Age)})
    CREATE (person)-[:From]->(family)"

Executing the Job and checking the result

  1. Press Ctrl+S to save the Job, and press F6 or click Run on the Run tab to run the Job.

  2. In the address bar of your Web browser, enter the URL of the Neo4j database browser, http://localhost:7474/ in this example, and enter the following Cypher query in the command line to view the Person and Family nodes linked via the relationship From:

    MATCH (a:Person)-[:`From`]->(b:Family) RETURN a,b;

    As shown in the graphic view, nodes labeled Family and Person have been created and the nodes of persons from the same families are linked with the relevant Family nodes via the relationship From.