Scenario 1: Extracting JSON data from a file using JSONPath without setting a loop node - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

This scenario describes a two-component Job that extracts data from the JSON file Store.json by specifying the complete JSON path for each node of interest and displays the flat data extracted on the console.

The JSON file Store.json contains information about a department store and the content of the file is as follows:

{"store": {
    "name": "Sunshine Department Store",
    "address": "Wangfujing Street",
    "goods": {
        "book": [
            {
                "category": "Reference",
                "title": "Sayings of the Century",
                "author": "Nigel Rees",
                "price": 8.88
            },
            {
                "category": "Fiction",
                "title": "Sword of Honour",
                "author": "Evelyn Waugh",
                "price": 12.66
            }
        ],
        "bicycle": {
            "type": "GIANT OCR2600",
            "color": "White",
            "price": 276
        }
    }
}}

In the following example, we will extract the store name, the store address, and the bicycle information from this file.

Adding and linking the components

  1. Create a new Job and add a tFileInputJSON component and a tLogRow component by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFileInputJSON component to the tLogRow component using a Row > Main connection.

Configuring the components

  1. Double-click the tFileInputJSON component to open its Basic settings view.

  2. Select JsonPath without loop from the Read By drop-down list. With this option, you need to specify the complete JSON path for each node of interest in the JSONPath query fields of the Mapping table.

  3. Click the [...] button next to Edit schema to open the schema editor.

  4. Click the [+] button to add five columns, store_name, store_address, bicycle_type, and bicycle_color of String type, and bicycle_price of Double type.

    Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema to the subsequent component.

  5. In the Filename field, specify the path to the JSON file that contains the data to be extracted. In this example, it is "E:/Store.json".

  6. In the Mapping table, the Column fields are automatically filled with the schema columns you have defined.

    In the JSONPath query fields, enter the JSONPath query expressions between double quotation marks to specify the nodes that hold the desired data.

    • For the columns store_name and store_address, enter the JSONPath query expressions "$.store.name" and "$.store.address" relative to the nodes name and address respectively.

    • For the columns bicycle_type, bicycle_color, and bicycle_price, enter the JSONPath query expressions "$.store.goods.bicycle.type", "$.store.goods.bicycle.color", and "$.store.goods.bicycle.price" relative to the child nodes type, color, and price of the bicycle node respectively.

  7. Double-click the tLogRow component to display its Basic settings view.

  8. In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Executing the Job

  1. Press Ctrl+S to save the Job.

  2. Press F6 to execute the Job.

    As shown above, the store name, the store address, and the bicycle information are extracted from the source JSON data and displayed in a flat table on the console.