Setting up JSON metadata for an input file - 6.2

Talend Real-time Big Data Platform Studio User Guide

English (United States)
Talend Real-Time Big Data Platform
Talend Studio
Data Quality and Preparation
Design and Development

This section describes how to define a file connection and upload a JSON schema for an input file. To define an output JSON file connection and schema, see Setting up JSON metadata for an output file.

Defining the general properties

  1. In the wizard, fill in the general information in the relevant fields to identify the JSON file metadata, including Name, Purpose and Description.

    The Name field is required, and the information you provide in the Description field will appear as a tooltip when you move your mouse pointer over the file connection.


    In this step, it is advisable to enter information that will help you distinguish between your input and output connections, which will be defined in the next step.

  2. If needed, set the version and status in the Version and Status fields respectively.

    You can also manage the version and status of a repository item in the [Project Settings] dialog box. For more information, see Version management and Status management respectively.

  3. If needed, click the Select button next to the Path field to select a folder under the File Json node to hold your newly created file connection.

  4. Click Next to select the type of metadata.

Setting the type of metadata and loading the input file

  1. In the dialog box, select Input Json and click Next to proceed to the next step of the wizard to load the input file.

  2. From the Read By list box, select the type of query to read the source JSON file.

    • JsonPath: read the JSON data based on a JsonPath query.

      This is the default and recommended query type to read JSON data in order to gain performance and to avoid problems that you may encounter when reading JSON data based on an XPath query.

    • Xpath: read the JSON data based on an XPath query.

  3. Click Browse... and browse your directory to the JSON file to be uploaded. Alternatively, enter the full path to the file or the URL that links to the JSON file.

    In this example, the input JSON file has the following content:

    {"movieCollection": [
            "type": "Action Movie",
            "name": "Brave Heart",
            "details": {
                "release": "1995",
                "rating": "5",
                "starring": "Mel Gibson"
            "type": "Action Movie",
            "name": "Edge of Darkness",
            "details": {
                "release": "2010",
                "rating": "5",
                "starring": "Mel Gibson"

    The Schema Viewer area displays a preview of the JSON structure. You can expand and visualize every level of the file's JSON tree structure.

  4. Enter the Encoding type in the corresponding field if the system does not detect it automatically.

  5. In the Limit field, enter the number of columns on which the JsonPath or XPath query is to be executed, or 0 if you want to run it against all of the columns.

  6. Click Next to define the schema parameters.

Defining the schema

In this step you will set the schema parameters.

The schema definition window is composed of four views:



Source Schema

Tree view of the JSON file.

Target Schema

Extraction and iteration information.


Preview of the target schema, together with the input data of the selected columns displayed in the defined order.

File Viewer

Preview of the JSON file's data.

  1. Populate the Path loop expression field with the absolute JsonPath or XPath expression, depending on the type of query you have selected, for the node to be iterated upon. There are two ways to do this, either:

    • enter the absolute JsonPath or XPath expression for the node to be iterated upon (enter the full expression or press Ctrl+Space to use the autocompletion list),

    • drag the loop element node from the tree view under Source schema into the Absolute path expression field of the Path loop expression table.

      An orange arrow links the node to the corresponding expression.


    The Path loop expression definition is mandatory.

  2. In the Loop limit field, specify the maximum number of times the selected node can be iterated.

  3. Define the fields to be extracted by dragging the nodes from the Source Schema tree into the Relative or absolute path expression fields of the Fields to extract table.


    You can select several nodes to drop onto the table by pressing Ctrl or Shift and clicking the nodes of interest.

  4. If needed, you can add as many columns to be extracted as necessary, delete columns or change the column order using the toolbar:

    • Add or delete a column using the [+] and [x] buttons.

    • Change the order of the columns using the and buttons.

  5. If you want your file schema to have different column names than those retrieved from the input file, enter new names in the corresponding Column name fields.

  6. Click Refresh Preview to preview the target schema. The fields are consequently displayed in the schema according to the defined order.

  7. Click Next to finalize the schema.

Finalizing the schema

The last step of the wizard shows the end schema generated and allows you to customize the schema according to your needs.

  1. If needed, rename the schema (by default, metadata) and leave a comment.

    Customize the schema if needed: add, remove or move schema columns, export the schema to an XML file, or replace the schema by importing an schema definition XML file using the tool bar.

    Make sure the data type in the Type column is correctly defined.

    For more information regarding Java data types, including date pattern, see Java API Specification.

    Below are the commonly used Talend data types:

    • Object: a generic Talend data type that allows processing data without regard to its content, for example, a data file not otherwise supported can be processed with a tFileInputRaw component by specifying that it has a data type of Object.

    • List: a space-separated list of primitive type elements in an XML Schema definition, defined using the xsd:list element.

    • Dynamic: a data type that can be set for a single column at the end of a schema to allow processing fields as VARCHAR(100) columns named either as 'Column<X>' or, if the input includes a header, from the column names appearing in the header. For more information, see Dynamic schema.

    • Document: a data type that allows processing an entire XML document without regarding to its content.

  2. If the JSON file which the schema is based on has been changed, click the Guess button to generate the schema again. Note that if you have customized the schema, the Guess feature does not retain these changes.

  3. Click Finish. The new file connection, along with its schema, is displayed under the relevant File Json metadata node in the Repository tree view.

Now you can drag and drop the file connection or the schema of it from the Repository tree view onto the design workspace as a new tFileInputJSON or tExtractJSONFields component or onto an existing component to reuse the metadata. For further information about how to use the centralized metadata in a Job, see How to use centralized metadata in a Job and How to set a repository schema.

To modify an existing file connection, right-click it from the Repository tree view, and select Edit JSON to open the file metadata setup wizard.

To add a new schema to an existing file connection, right-click the connection from the Repository tree view and select Retrieve Schema from the contextual menu.

To edit an existing file schema, right-click the schema from the Repository tree view and select Edit Schema from the contextual menu.