Scenario 4: Extracting JSON data from a URL - 6.3

Talend Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, tFileInputJSON retrieves data of the friends node from the JSON file facebook.json on the Web that contains the data of a Facebook user and tExtractJSONFields extracts the data from the friends node for flat data output.

The JSON file facebook.json is deployed on the Tomcat server, specifically, located in the folder <tomcat path>/webapps/docs, and the content of the file is as follows:

{"user": {
    "id": "9999912398",
    "name": "Kelly Clarkson",
    "friends": [
        {
            "name": "Tom Cruise",
            "id": "55555555555555",
            "likes": {"data": [
                {
                    "category": "Movie",
                    "name": "The Shawshank Redemption",
                    "id": "103636093053996",
                    "created_time": "2012-11-20T15:52:07+0000"
                },
                {
                    "category": "Community",
                    "name": "Positiveretribution",
                    "id": "471389562899413",
                    "created_time": "2012-12-16T21:13:26+0000"
                }
            ]}
        },
        {
            "name": "Tom Hanks",
            "id": "88888888888888",
            "likes": {"data": [
                {
                    "category": "Journalist",
                    "name": "Janelle Wang",
                    "id": "136009823148851",
                    "created_time": "2013-01-01T08:22:17+0000"
                },
                {
                    "category": "Tv show",
                    "name": "Now With Alex Wagner",
                    "id": "305948749433410",
                    "created_time": "2012-11-20T06:14:10+0000"
                }
            ]}
        }
    ]
}}

Adding and linking the components

  1. Create a new Job and add a tFileInputJSON component, a tExtractJSONFields component, and two tLogRow components by typing their names in the design workspace or dropping them from the Palette.

  2. Link the tFileInputJSON component to the first tLogRow component using a Row > Main connection.

  3. Link the first tLogRow component to the tExtractJSONFields component using a Row > Main connection.

  4. Link the tExtractJSONFields component to the second tLogRow component using a Row > Main connection.

Configuring the components

  1. Double-click the tFileInputJSON component to open its Basic settings view.

  2. Select JsonPath without loop from the Read By drop-down list. Then select the Use Url check box and in the URL field displayed enter the URL of the file facebook.json from which the data will be retrieved. In this example, it is http://localhost:8080/docs/facebook.json.

  3. Click the [...] button next to Edit schema and in the [Schema] dialog box define the schema by adding one column friends of String type.

    Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.

  4. In the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends column to retrieve the entire friends node from the source file.

  5. Double-click tExtractJSONFields to open its Basic settings view.

  6. Select Xpath from the Read By drop-down list.

  7. In the Loop XPath query field, enter the XPath expression between double quotation marks to specify the node on which the loop is based. In this example, it is "/likes/data".

  8. Click the [...] button next to Edit schema and in the [Schema] dialog box define the schema by adding five columns of String type, id, name, like_id, like_name, and like_category, which will hold the data of relevant nodes under the JSON field friends.

    Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.

  9. In the XPath query fields of the Mapping table, type in the XPath query expressions between double quotation marks to specify the JSON nodes that hold the desired data. In this example,

    • "../../id" (querying the "/friends/id" node) for the column id,

    • "../../name" (querying the "/friends/name" node) for the column name,

    • "id" for the column like_id,

    • "name" for the column like_name, and

    • "category" for the column like_category.

  10. Double-click the second tLogRow component to open its Basic settings view.

    In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Click F6 to execute the Job.