Scenario 2: Collecting data from your favorite online social network - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, tFileInputJSON retrieves the friends node from a JSON file that contains the data of a Facebook user and tExtractJSONFields extracts the data from the friends node for flat data output.

Linking the components

  1. Drop the following components from the Palette onto the design workspace: tFileInputJSON, tExtractJSONFields and tLogRow.

  2. Link tFileInputJSON and tExtractJSONFields using a Row > Main connection.

  3. Link tExtractJSONFields and tLogRow using a Row > Main connection.

Configuring the components

  1. Double-click tFileInputJSON to display its Basic settings view.

  2. Click Edit schema to open the schema editor.

    Click the [+] button to add one column, namely friends, of the String type.

    Click OK to close the editor.

  3. Click the [...] button to browse for the JSON file, facebook.json in this case:

    { "user": { "id": "9999912398",
                "name": "Kelly Clarkson",
                "friends": [
                     { "name": "Tom Cruise",
                       "id": "55555555555555",
                       "likes": {
                           "data": [
                                { "category": "Movie",
                                  "name": "The Shawshank Redemption",
                                  "id": "103636093053996",
                                  "created_time": "2012-11-20T15:52:07+0000"
                                },
                                { "category": "Community",
                                  "name": "Positiveretribution",
                                  "id": "471389562899413",
                                  "created_time": "2012-12-16T21:13:26+0000"
                                }
                                    ]
                                }
                     },
                     { "name": "Tom Hanks",
                       "id": "88888888888888"
                       "likes": {
                            "data": [
                                { "category": "Journalist",
                                  "name": "Janelle Wang",
                                  "id": "136009823148851",
                                  "created_time": "2013-01-01T08:22:17+0000"
                                },
                                { "category": "Tv show",
                                  "name": "Now With Alex Wagner",
                                  "id": "305948749433410",
                                  "created_time": "2012-11-20T06:14:10+0000"
                                }
                                ]
                               }
                      }
                            ]
              }
    }
    
  4. Clear the Read by XPath check box.

    In the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends column, retrieving the entire friends node from the source file.

  5. Double-click tExtractJSONFields to display its Basic settings view.

  6. Click Edit schema to open the schema editor.

  7. Click the [+] button in the right panel to add five columns, namely id, name, like_id, like_name and like_category, which will hold the data of relevant nodes in the JSON field friends.

    Click OK to close the editor.

  8. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.

  9. In the Loop XPath query field, enter "/likes/data".

  10. In the Mapping area, type in the queries of the JSON nodes in the XPath query column. The data of those nodes will be extracted and passed to their counterpart columns defined in the output schema.

  11. Specifically, define the XPath query "../../id" (querying the "/friends/id" node) for the column id, "../../name" (querying the "/friends/name" node) for the column name, "id" for the column like_id, "name" for the column like_name, and "category" for the column like_category.

  12. Double-click tLogRow to display its Basic settings view.

  13. Select Table (print values in cells of a table) for a better display of the results.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Click F6 to execute the Job.

    As shown above, the friends data of the Facebook user Kelly Clarkson is extracted correctly.