Skip to main content Skip to complementary content

Extracting the hashtag field from the raw Tweet data

Procedure

  1. Double-click tExtractJSONFields to open its Component view.
    As you can read from https://dev.twitter.com/overview/api/entities-in-twitter-objects#hashtags, the raw Tweet data uses the JSON format.
  2. Click Sync columns to retrieve the schema from its preceding component. This is actually the read-only schema of tKafkaInput, since tWindow does not impact the schema.
  3. Click the [...] button next to Edit schema to open the schema editor.
  4. Rename the single column of the output schema to hashtag. This column is used to carry the hashtag field extracted from the Tweet JSON data.
  5. Click OK to validate these changes.
  6. From the Read by list, select JsonPath.
  7. From the JSON field list, select the column of the input schema from which you need to extract fields. In this scenario, it is payload.
  8. In the Loop Jsonpath query field, enter JSON path pointing to the element over which extraction is looped. According to the JSON structure of a Tweet as you can read from the documentation of Twitter, enter $.entities.hashtags to loop over the hashtags entity.
  9. In the Mapping table, in which the hashtag column of the output schema has been filled in automatically, enter the element on which the extraction is performed. In this example, this is the text attribute of each hashtags entity. Therefore, enter text within double quotation marks in the Json query column.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!