Configuring a Big Data Streaming Job using the Spark Streaming Framework - Cloud - 8.0

Kinesis

Version
Cloud
8.0
Language
English
Product
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Messaging components (Integration) > Kinesis components
Data Quality and Preparation > Third-party systems > Messaging components (Integration) > Kinesis components
Design and Development > Third-party systems > Messaging components (Integration) > Kinesis components
Last publication date
2024-02-20
Before running your Job, you need to configure it to use your Amazon EMR cluster.

Procedure

  1. Because your Job will run on Spark, it is necessary to add a tHDFSConfiguration component and then configure it to use the HDFS connection metadata from the repository.
  2. In the Run view, click the Spark Configuration tab.
  3. In the Cluster Version panel, configure your Job to user your cluster connection metadata.
  4. Set the Batch size to 2000 ms.
  5. Because you will set some advanced properties, change the Property type to Built-In.
  6. In the Tuning panel, select the Set tuning properties option and configure the fields as follows.
  7. Run your Job.

    It takes a couple of minutes to have data displayed in the Console.