Before you begin
In this section, it is assumed that you have an Amazon EMR cluster up and running and
that you have created the corresponding cluster connection metadata in the
repository. It is also assumed that you have created an Amazon Kinesis stream.
Procedure
-
Create a Big Data Streaming Job using the Spark framework.
-
In this example the data, which will be written to Amazon Kinesis, are
generated with a tRowGenerator component.
-
The data must be serialized as byte arrays before being written to the Amazon
Kinesis stream. Add a tWriteDelimitedFields component and
connect it to the tRowGenerator component.
-
Configure the Output type to
byte[].
-
To write the data to your Kinesis stream, add a
tKinesisOutput component and connect the
tWriteDelimitedFields component to it.
-
Provide your Amazon credentials.
-
To access your Kinesis stream, provide the Stream name and the corresponding
endpoint url.
-
Provide the number of shards, as specified when you created the Kinesis
stream.