Protects data by transforming it into unreadable cipher text.
Only users with the user-defined password and the cryptographic file can decrypt this cipher text and read the original data.
In local mode, Apache Spark 1.6 and later versions are supported.
For more technologies supported by Talend, see Talend components.
Why encrypting data?
Encryption is used to protect your assets, your organization, or customers' sensitive data. Encryption can protect data from internal or external leakage.
In Big Data environments, large volumes of data from many sources are collected, manipulated and stored in various formats. Encryption helps reduce the risk of sensitive data exposure.
Encryption is also recommended or required for compliance with data protection laws.
Considerations for data encryption
- Define the type of the data to be protected: data in transit or data at rest.
- Identify the scope of the data to be protected: purpose, ownership, access, etc.
- Provide strong passwords for the cryptographic file.
- Do not reuse passwords for different data encryption operations.
- Store passwords in a secure password management system.
- Make sure only authorized users get access to the password and the cryptographic file necessary to decrypt back data.
- Strong encryption methods generally increase required resources.
- Separate the cryptographic file from the encrypted data to keep your data secure.
- It is advised to use different cryptographic files to encrypt different datasets.
- Data encryption is not a complete security approach. Combining different security layers help address concerns about sensitive data. Security layers include vulnerability assessment and management or anti-malware solutions.
Data encryption methods
|GCM mode of operation||CBC mode of operation|
|Uses a randomly generated 256-bit key||Uses a randomly generated 256-bit key|
|Integrity check||No integrity check|
|Faster on modern CPUs||Computationally faster|
|Standardized by the National Institute of Standards and Technology (NIST)||-|
|Used by SSL/TLS||-|
The data encryption process
- Generating the cryptographic file. It contains:
- A randomly generated salt used to derive a cryptographic key from the user-defined password using the PBKDF2 key derivation function.
- A randomly generated 256-bit key encrypted with AES and the user-defined password.
- The encryption method encrypted with AES and the user-defined password.
- Accessing the encrypted data from the cryptographic file by:
- Using the randomly generated salt to derive the cryptographic key from the user-defined password.
- Using the cryptographic key and the AES method to decrypt the randomly generated 256-bit key and the encryption method.
During the decryption, if the password is correct, the component can access the encryption method and the randomly generated 256-bit key. Otherwise, the access is denied.
- Encrypting the data using:
- The randomly generated 256-bit key from the cryptographic file
- The encryption method
- A random initialization vector (IV) generated for each data