Scenario 1: Retrieving error messages while extracting data from JSON fields - 6.3

Talend Open Studio for Big Data Components Reference Guide

EnrichVersion
6.3
EnrichProdName
Talend Open Studio for Big Data
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, tWriteJSONField wraps the incoming data into JSON fields, data of which is then extracted by tExtractJSONFields. Meanwhile, the error messages generated due to extraction failure, which include the concerned JSON fields and errors, are retrieved via a Row > Reject link.

Linking the components

  1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tWriteJSONField, tExtractJSONFields, and tLogRow (X2). The two tLogRow components are renamed as data_extracted and reject_info.

  2. Link tFixedFlowInput and tWriteJSONField using a Row > Main connection.

  3. Link tWriteJSONField and tExtractJSONFields using a Row > Main connection.

  4. Link tExtractJSONFields and data_extracted using a Row > Main connection.

  5. Link tExtractJSONFields and reject_info using a Row > Reject connection.

Configuring the components

Setting up the tFixedFlowInput

  1. Double-click tFixedFlowInput to display its Basic settings view.

  2. Click Edit schema to open the schema editor.

    Click the [+] button to add three columns, namely firstname, lastname and dept, with the type of string.

    Click OK to close the editor.

  3. Select Use Inline Content and enter the data below in the Content box:

    Andrew;Wallace;Doc
    John;Smith;R&D
    Christian;Dior;Sales

Setting up the tWriteJSONField

  1. Click tWriteJSONField to display its Basic settings view.

  2. Click Configure JSON Tree to open the XML tree editor.

    The schema of tFixedFlowInput appears in the Linker source panel.

  3. In the Linker target panel, click the default rootTag and type in staff, which is the root node of the JSON field to be generated.

  4. Right-click staff and select Add Sub-element from the context menu.

  5. In the pop-up box, enter the sub-node name, namely firstname.

    Repeat the steps to add two more sub-nodes, namely lastname and dept.

  6. Right-click firstname and select Set As Loop Element from the context menu.

  7. Drop firstname from the Linker source panel to its counterpart in the Linker target panel.

    In the pop-up dialog box, select Add linker to target node.

    Click OK to close the dialog box.

  8. Repeat the steps to link the two other items.

    Click OK to close the XML tree editor.

  9. Click Edit schema to open the schema editor.

  10. Click the [+] button in the right panel to add one column, namely staff, which will hold the JSON data generated.

    Click OK to close the editor.

Setting up the tExtractJSONFields

  1. Double-click tExtractJSONFields to display its Basic settings view.

  2. Click Edit schema to open the schema editor.

  3. Click the [+] button in the right panel to add three columns, namely firstname, lastname and dept, which will hold the data of their counterpart nodes in the JSON field staff.

    Click OK to close the editor.

  4. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.

  5. In the Loop XPath query field, enter "/staff", which is the root node of the JSON data.

  6. In the Mapping area, type in the node name of the JSON data under the XPath query part. The data of those nodes will be extracted and passed to their counterpart columns defined in the output schema.

  7. Specifically, define the XPath query "firstname" for the column firstname, "lastname" for the column lastname, and "" for the column dept. Note that "" is not a valid XPath query and will lead to execution errors.

Setting up the tLogRow components

  1. Double-click data_extracted to display its Basic settings view.

  2. Select Table (print values in cells of a table) for a better display of the results.

  3. Perform the same setup on the other tLogRow component, namely reject_info.

Executing the Job

  1. Press Ctrl + S to save the Job.

  2. Click F6 to execute the Job.

    As shown above, the reject row offers such details as the data extracted, the JSON fields whose data is not extracted and the cause of the extraction failure.