Before running your Job, you need to configure it to use your Amazon EMR
cluster.
Procedure
-
Because your Job will run on Spark, it is necessary to add a
tHDFSConfiguration component and then configure it to
use the HDFS connection metadata from the repository.
- In the Run view, click the Spark Configuration tab.
-
In the Cluster Version panel, configure your Job to user
your cluster connection metadata.
- Set the Batch size to 2000 ms.
- Because you will set some advanced properties, change the Property type to Built-In.
-
In the Tuning panel, select the Set tuning
properties option and configure the fields as follows.
-
Run your Job.
It takes a couple of minutes to have data displayed in the Console.