Configuring a Big Data Streaming Job using the Spark Streaming Framework

Kinesis

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Messaging components (Integration) > Kinesis components
Data Quality and Preparation > Third-party systems > Messaging components (Integration) > Kinesis components
Design and Development > Third-party systems > Messaging components (Integration) > Kinesis components
EnrichPlatform
Talend Studio
Before running your Job, you need to configure it to use your Amazon EMR cluster.

Procedure

  1. Because your Job will run on Spark, it is necessary to add a tHDFSConfiguration component and then configure it to use the HDFS connection metadata from the repository.
  2. In the Run view, click the Spark Configuration tab.
  3. In the Cluster Version panel, configure your Job to user your cluster connection metadata.
  4. Set the Batch size to 2000 ms.
  5. Because you will set some advanced properties, change the Property type to Built-In.
  6. In the Tuning panel, select the Set tuning properties option and configure the fields as follows.
  7. Run your Job.

    It takes a couple of minutes to have data displayed in the Console.