Centralizing Neo4j metadata - 6.3

Talend Data Fabric Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend Data Fabric
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

If you often need to handle data of a Neo4j database, then you may want to centralize the connection to the Neo4j database and the schema details in the Metadata folder in the Repository tree view.

The Neo4j metadata setup procedure is made of two separate but closely related major tasks:

  1. Create a connection to a Neo4j database.

  2. Retrieve Neo4j schemas of interest.

Prerequisites:

  • All the required external modules that are missing in Talend Studio due to license restrictions have been installed. For more information, see the Talend Installation Guide.

  • You are familiar with Cypher queries for reading data in Neo4j.

  • The Neo4j server is up and running if you need to connect to the Neo4j database in Remote mode.

Creating a connection to a Neo4j database

  1. In the Repository tree view, expand the Metadata node, right-click NoSQL Connection, and select Create Connection from the contextual menu. The connection wizard opens up.

  2. In the connection wizard, fill in the general properties of the connection you need to create, such as Name, Purpose and Description.

    The information you fill in the Description field will appear as a tooltip when you move your mouse pointer over the connection.

    When done, click Next to proceed to the next step.

  3. Select Neo4j from the DB Type list, and specify the connection details:

    • To connect to a Neo4j database in Local mode, also known as embedded mode, select the Local option and specify the directory holding your Neo4j data files.

    • To connect to a Neo4j database in Remote mode, also known as REST mode, select the Remote option and enter the URL of the Neo4j server.

    In this example, the Neo4j database is accessible in Remote mode, and the Neo4j server URL is the default URL proposed by the wizard.

  4. Click the Check button to make sure that the connection works.

  5. Click Finish to validate the settings.

    The newly created Neo4j database connection appears under the NoSQL Connection node in the Repository tree view. You can now drop it onto your design workspace as a Neo4j component, but you still need to define the schema information where needed.

    Next, you need to retrieve one or more schemas of interest for your connection.

Retrieving a schema

In this step, we will retrieve the schema of interest from the connected Neo4j database.

  1. In the Repository view, right-click the newly created connection and select Retrieve Schema from the contextual menu.

    The wizard opens a new view for schema generation based on a Cypher query.

  2. In the Cypher field, enter your Cypher query to match the nodes and retrieve the properties of interest.

    Warning

    If your Cypher query includes strings, enclose your strings between single quotation marks instead of double ones, which will cause errors in Neo4j components dropped from your centralized metadata.

    In this example, the following query is used to match nodes labelled Employees and retrieve their properties ID, Name, HireDate, Salary, and ManagerID as schema columns:

    MATCH (n:Employees) RETURN n.ID, n.Name, n.HireDate, n.Salary, n.ManagerID;

    If you want to retrieve all the properties of nodes labelled Employees in this example, you can enter a query like this:

    MATCH (n:Employees) RETURN n;

    or:

    MATCH (n:Employees) RETURN *;
  3. Click Next to proceed to the next step of the wizard where you can edit the generated schema.

    Modify the schema if needed. You can rename the schema, and customize the schema structure according to your needs in the Schema area.

    The tool bar allows you to add, remove or move columns in your schema, or replace the schema with the schema defined in an XML file.

    To add a new schema, click the Add Schema button in the Schema panel, which creates an empty schema for you to define.

    To remove a schema, select the schema name in the Schema panel and click the Remove Schema button.

  4. Click Finish to complete the schema creation. The result schema appears under your Neo4j connection in the Repository view. You can now drop the connection or any schema node under it onto your design workspace as a Neo4j component, with all the metadata information automatically filled.

    If you need to further edit a schema, right-click the schema and select Edit Schema from the contextual menu to open this wizard again and make your modifications.

    Warning

    If you modify the schemas, ensure that the data type in the Type column is correctly defined.