Writing server-side KMS encrypted data on EMR - 7.3

Amazon EMR distribution

Version
7.3
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development > Designing Jobs > Hadoop distributions > Amazon EMR

If the AWS SSE-KMS encryption (at-rest encryption) service is enabled to set Default encryption to protect data on the S3A system of your EMR cluster, select the SSE-KMS option in tS3Configuration when writing data to that S3A system.

The sample data used in this scenario is about different types of incidents that people reported occurring on Paris streets within one day.
1;226 rue marcadet, 75018 Paris;abandoned object;garbage on the street
2;2 rue marcadet, 75018 Paris;shift and damage;direction sign damaged
3;45 boulevard de la villette, 75010 Paris; abandoned object; suspicious package
4;10 rue emile lepeu, 75011 Paris;graffiti and improper poster;graffiti
5;27 avenue emile zola, 75015 Paris;shift and damage;deformed road
The sample data is used for demonstration purposes only.

The Job calculates the occurrence of each incident type.

This is the image of the Job designed to write the encrypted data to EMR.

For more technologies supported by Talend, see Talend components.

Prerequisites:
  • The S3 system to be used is S3A.
  • The SSE-KMS encryption service on AWS is enabled with the Default encryption feature and a customer managed CMK has been specified for it.
  • The EMR cluster to be used is created with SSE-KMS and the EMR_EC2_DefaultRole role has been added to the above-mentioned CMK.
  • The administrator of your EMR cluster has granted the appropriate rights and permissions to the AWS account you are using in your Jobs.
  • Your EMR cluster has been properly set up and is running.
  • A Talend Jobserver has been deployed on an instance within the network of your EMR cluster, such as the instance for the master of your cluster.
All these operations are done on the AWS side.
  • In your Studio or in Talend Administration Center, define this Jobserver as the execution server of your Jobs.

Ensure that the client machine on which the Talend Jobs are executed can recognize the host names of the nodes of the Hadoop cluster to be used. For this purpose, add the IP address/hostname mapping entries for the services of that Hadoop cluster in the hosts file of the client machine.

If this is the first time your EMR cluster is set up to run with Talend Jobs, search for Amazon EMR - Getting Started on Talend Help Center (https://help.talend.com) to verify your setup so as to help your Jobs work more efficiently on top of EMR.