Writing the aggregated data about street incidents to EMR

Double-click the tFileOutputParquet component to open its Component view.

Example

Select the Define a storage configuration component check box and then select the tS3Configuration component you configured in the previous steps.

Click Sync columns to ensure that tFileOutputParquet retrieve the schema from the output side of tAggregateRow.

In the Folder/File field, enter the name of the folder to be used to store the aggregated data in the S3 bucket specified in tS3Configuration. For example, enter /sample_user, then at runtime, the folder called sample_user at the root of the bucket is used to store the output of your Job.

From the Action drop-down list, select Create if the folder to be used does not exist yet in the bucket to be used; if this folder already exists, select Overwrite.

Click Run to open its view and then click the Spark Configuration tab to display its view for configuring the Spark connection.

Select the Use local mode check box to test your Job locally before eventually submitting it to the remote Spark cluster.

In the local mode, the Studio builds the Spark environment in itself on the fly in order to run the Job in. Each processor of the local machine is used as a Spark worker to perform the computations.

In this mode, your local file system is used; therefore, deactivate the configuration components such as tS3Configuration or tHDFSConfiguration that provides connection information to a remote file system, if you have placed these components in your Job.

In the Component view of tFileOutputParquet, change the file path in the Folder/File field to a local directory and adapt the action to be taken on the Action drop-down list, that is to say, creating a new folder or overwriting the existing one.

On the Run tab, click Basic Run and in this view, click Run to execute your Job locally to test its design logic.

When your Job runs successfully, clear the Use local mode check box in the Spark Configuration view of the Run tab, then in the design workspace of your Job, activate the configuration components and revert the changes you just made in tFileOutputParquet for the local test.

Writing the aggregated data about street incidents to EMR - 7.3

Amazon EMR distribution

Procedure

Example