Scenario 2: Writing family information to Neo4j and creating relationships - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a Job that will write family information to labeled nodes in a remote Neo4j database and create relationships based on the family names.

Adding and linking components

  1. Create a Job and add the following components to the Job by typing theirs names in the design workspace or dropping them from the Palette:

    • a tFileInputDelimited component, to read the family data from a CSV file,

    • a tNeo4jOutput component to write the family data to a Neo4j database and create relationships between husband and wife.

  2. Link the tFileInputDelimited component to the tNeo4jOutput component using a Row > Main connection.

  3. Label the components to better identify their roles in the Job.

Configuring the components

Configuring the data source

  1. Double-click the tFileInputDelimited component to open its Basic settings view on the Components tab.

  2. In the File name/Stream field, specify the path to the CSV file that contains the family data to read.

    The input CSV file used in this example is as follows:

    Name;Gender;Age;Family
    Jenny;Female;24;the Johnsons
    Jack;Male;26;the Johnsons
    Richard;Male;35;the Blacks
    Anne;Female;36;the Whites
    Helen;Female;28;the Blacks
    Tom;Male;38;the Whites
  3. In the Header field, specify the number of rows to skip as header rows. In this example, the first row of the CSV file is the header row.

  4. Click the [...] button next to Edit schema to open the [Schema] dialog box, and define the input schema based on the structure of the input file. In this example, the input schema is composed of six columns: name (integer), gender (String), age (Integer), and family (String).

    When done, click OK to close the [Schema] dialog box and propagate the schema to the next component.

Writing data to Neo4j and creating indexes and relationships

  1. Click the tNeo4jOutput component and select the Component tab to open its Basic settings view.

  2. From the DB Version list, select Neo4J 2.X.X to enable node labeling.

  3. Define a Neo4j database connection. In this example, the Neo4j database is accessible in REST mode, so select the Remote server check box and specify the URL of the Neo4j server in the Server URL field, "http://localhost:7474/db/data" in this example.

  4. Double-click the tNeo4jOutput component or click the Mapping button on the component's Basic settings view to open the index and relationship mapping editor.

  5. With the name column selected from the schema panel, click the Index creation tab, click the [+] button to add a row in the table, and create an index named first_name on this column:

    • In the Name field, enter first_name between double quotation marks.

    • In the Key field, enter first_name between double quotation marks to give the index a key.

    Then click in the schema panel to validate your index creation.

  6. With the family column selected from the schema panel, click the Index creation tab, click the [+] button to add a row in the table, and create an index named family on this column:

    • In the Name field, enter family between double quotation marks.

    • In the Key field, enter family_name between double quotation marks to give the index a key.

    Then click in the schema panel to validate your index creation.

  7. With the family column selected from the schema panel, click the Relationship creation tab, click the [+] button to add a row in the table, and create a relationship named Spouse on this column based on the index named family:

    • In the Type field, enter Spouse between double quotation marks.

    • From the Direction list field, select either Outgoing or Incoming.

    • In the Index Name field, enter family between double quotation marks.

    • In the Index Key field, enter family_name between double quotation marks.

    Then click in the schema panel to validate your relationship creation, and click OK to close the mapping editor.

  8. Select the Use label (Neo4j > 2.0) check box and enter Families between double quotation marks in the Label name field so that the nodes to be created will be labeled Families.

  9. From the Data action list, select Insert or update, and set a reference key in the Index area that appears:

    • In Index name field, enter first_name between double quotation marks.

    • In Index key field, enter first_name between double quotation marks.

    • From Index value field, select name. As the Value field is left blank in index creation, the index value will be the value of the name column for each row.

    This way, when the Job is executed, nodes will be inserted or updated in the Neo4j database based on the first_name index: for each data row, if a node containing the same first name already exists in the database, the node will be updated; otherwise, a new node will be created.

Executing the Job and checking the result

  1. Press Ctrl+S to save the Job, and press F6 or click Run on the Run tab to run the Job.

  2. In the address bar of your Web browser, enter the URL of the Neo4j database browser, http://localhost:7474/ in this example, and enter the following Cypher query in the command line to view the nodes.

    MATCH (n:`Families`) RETURN n;

    As shown in the graphic view, three pairs of nodes labeled Families have been created and those with the same family name are linked together via the relationship Spouse.