Configuring Talend Data Preparation - 6.4

Talend MDM Platform Installation Guide for Windows

EnrichVersion
6.4
EnrichProdName
Talend MDM Platform
task
Installation and Upgrade
EnrichPlatform
Talend Activity Monitoring Console
Talend Administration Center
Talend Artifact Repository
Talend CommandLine
Talend Data Preparation
Talend Data Stewardship
Talend DQ Portal
Talend ESB
Talend Identity Management
Talend Installer
Talend JobServer
Talend Log Server
Talend MDM Server
Talend MDM Web UI
Talend Project Audit
Talend Repository Manager
Talend Runtime
Talend SAP RFC Server
Talend Studio

This section contains information on the initial configuration actions to be performed after installation, how to secure connections for Talend Data Preparation and how to configure the application logs.

Configuring Talend Data Preparation after installation

When you install Talend Data Preparation, you need to perform certain configuration steps before use.

  1. Open the <Data_Preparation_Path>/config/application.properties file and edit the following Talend Data Preparation properties:

    Note

    All the passwords entered in the properties file are encrypted when you start your Talend Data Preparation instance.

    Field

    Action

    tac.url

    Enter the URL to your Talend Administration Center followed by a /.

    public.ip

    Enter the URL you want to use to access Talend Data Preparation.

    server.port

    Enter the port you want to use for Talend Data Preparation user interface.

    iam.ip

    Enter the URL to your Talend Identity and Access Management instance.

    tac.user-name

    Enter the username of your Data Preparation user in Talend Administration Center.

    tac.password

    Enter the password of your Data Preparation user in Talend Administration Center.

    security.oauth2.client.clientId

    Enter the Talend Identity and Access Management OIDC client identifier.

    security.oauth2.client.clientSecret

    Enter the Talend Identity and Access Management OIDC client password.

    iam.scim.url

    Make sure that Talend Identity and Access Management port is correct.

  2. Update the following fields with your MongoDB settings:

    Field

    Description

    mongodb.host

    Host name of your MongoDB instance

    mongodb.port

    Port number of your MongoDB instance

    mongodb.database

    Name of the database on which Talend Data Preparation is connected, dataprep by default. The database is created when you first launch Talend Data Preparation.

    mongodb.user

    Username used to connect to the database

    mongodb.password

    Password used to connect to the database

  3. To enable the interaction between Talend Data Preparation and the Components Catalog service, edit the following line with your Components Catalog server host and port:

    tcomp.server.url=http://<tcomp_host>:<tcomp_port>/tcomp

  4. To configure the access to Talend Dictionary Service, edit the following fields:

    Field

    Description

    dataquality.semantic.update.enable

    Set the value of this parameter to true in order to enable the interaction between Talend Data Preparation and Talend Dictionary Service.

    dataquality.semantic.list.enable

    Set the value of this parameter to true in order to display the semantic type management interface in Talend Data Preparation.

    semanticservice.url

    Enter the URL to your Talend Dictionary Service instance.

    spring.cloud.stream.kafka.binder.brokers

    Enter the host corresponding to your Kafka broker.

    spring.cloud.stream.kafka.binder.defaultBrokerPort

    Enter the port corresponding to your Kafka broker.

    spring.cloud.stream.kafka.binder.zkNodes

    Enter the host corresponding to your Zookeeper node.

    spring.cloud.stream.kafka.binder.defaultZkPort

    Enter the port corresponding to your Zookeeper node.

  5. Change the value of the dataquality.indexes.file.location property from ${java.io.tmpdir}/org.talend.dataquality.semantic to <other_location>/org.talend.dataquality.semantic.

    By default, the custom semantic types that you create using the Dictionary Service server are stored in a tmp directory. To avoid losing your changes, it is recommended to change the save location of your custom semantic types. You can set a path to to the location of your choice, as long as it is not in a tmp folder.

  6. Execute the start.bat file to start your Talend Data Preparation instance.

Configuring an HTTPS connection for Talend Data Preparation

To set up an HTTPS secure connection between the different services, as well as with the MongoDB server, you need to edit the application.properties file.

Note that securing the MongoDB connection is not possible if you selected the embedded MongoDB instance during the installation process.

If you want to secure connections with MongoDB using SSL, MongoDB Enterprise Server has to be manually installed on your machine. For more information, see https://docs.mongodb.com/v3.2/security/.

  1. Open the <Data_Preparation_Path>/config/application.properties file.

  2. To define the path and password of the certificate for the Data Preparation server, edit the following lines:

    # server TLS setup
    tls.key-store=/path/to/key-store.jks
    tls.key-store-password=key-store_password
  3. To define the path and password of the signing Certificate Authority (CA) that issued the server certificate, edit the following lines:

    tls.trust-store=/path/to/trust-store.jks
    tls.trust-store-password=trust-store_password
  4. To make the security control more flexible regarding the certificate common name and its URL, edit the following lines:

    # false to disable hostname verification
    tls.verify-hostname=false
  5. To define the path and password of the signing Certificate Authority (CA) that issued the MongoDB server certificate, edit the following lines:

    mongodb.ssl=true
    mongodb.ssl.trust-store=/path/to/trus-store.jks
    mongodb.ssl.trust-store-password=trust-store-password
  6. Change the services URLs from http to https:

    dataset.service.url=https://${public.ip}:${server.port}
    transformation.service.url=https://${public.ip}:${server.port}
    preparation.service.url=https://${public.ip}:${server.port}

Talend Data Preparation only supports the Java Key Store (.jks) format to store keys and certificates.

Configuring Talend Data Preparation when Talend Administration Center is in HTTPS

For Talend Data Preparation to be able to connect to a Talend Administration Center instance running in https, Talend Data Preparation must trust the Talend Administration Center certificate.

  1. Retrieve Talend Administration Center certificate, or its Certificate Authority and add it to an existing or new .jks file following this example:

    keytool -import -trustcacerts -alias <cert-alias> -file <tac_certificate.crt> -keystore <truststore.jks>

  2. In the <Data_Preparation_Path>/config/application.properties file, add the following properties to set the truststore:

    tls.trust-store=/path/to/<truststore.jks>
    tls.trust-store-password=<trust-store_password>
    
    false to disable hostname verification
    tls.verify-hostname=false
  3. Restart Talend Data Preparation.

Configuring an HTTPS connection with Talend Dictionary Service

Securing the connection between Talend Data Preparation and Talend Talend Dictionary Service requires editing their corresponding configuration files.

You will first have to configure Talend Dictionary Service as a service in HTTPS. Then, you will enable SSL communication between Talend Data Preparation and Talend Dictionary Service running in HTTPS.

Prerequisites:

  • Talend Data Preparation has been configured as a service in HTTPS. For more information, see Configuring an HTTPS connection for Talend Data Preparation.

  • Talend Dictionary Service has been configured as a service in HTTPS. For more information, see Securing connections for Talend Dictionary Service.

  • You have generated a certificate for Talend Data Preparation and Talend Dictionary Service, and added it to your Web browser truststore.

  1. To enable SSL communication between Talend Data Preparation and Talend Dictionary Service running in HTTPS, retrieve the Talend Dictionary Service certificate, or its Certificate Authority, and add it to the Talend Data Preparation truststore using the following command:

    keytool -import -trustcacerts -alias <cert-alias> -file <dictionary-service_certificate.crt> -keystore <truststore.jks>

  2. In the <Data_Preparation_Path>/config/application.properties file, add the following properties to set the truststore:

    tls.trust-store=/path/to/<truststore.jks>
    tls.trust-store-password=<trust-store_password>
    
    false to disable hostname verification
    tls.verify-hostname=false
  3. Restart the services.

Your Talend Data Preparation instance running in HTTPS can now communicate with Talend Dictionary Service, also running with a secured HTTPS connection.

Using the tDataprepRun component with an HTTPS connection

In order to make the tDataprepRun component work when running Talend Data Preparation with an https connection, complete the following configuration:

  1. Retrieve Talend Data Preparation certificate, or its Certificate Authority and add it to an existing or new .jks file following this example:

    keytool -import -trustcacerts -alias <cert-alias> -file <dp_certificate.crt> -keystore <truststore.jks>

  2. To make the Studio trust the Talend Data Preparation certificate, edit the .ini file used to start the Studio:

    -Djavax.net.ssl.trustStore=/path/to/<trust-store.jks>
    -Djavax.net.ssl.trustStorePassword=<trust-store password>
  3. When designing your Job in the Studio, connect a tSetKeystore component to the data input component with an OnSubjobOk link in order for the Job to trust the Talend Data Preparation certificate. For more information on how to configure the tSetKeystore, see Talend Components Reference Guide.

For more information on how to use the tDataprepRun component and how to operationalize a recipe in a Talend Job, see Talend Help Center (https://help.talend.com).

Creating a live dataset with an HTTPS connection

To create a working live dataset when running Talend Data Preparation with an https connection, complete the following configuration:

  1. Retrieve Talend Data Preparation certificate, or its Certificate Authority and add it to an existing or new .jks file following this example:

    keytool -import -trustcacerts -alias <cert-alias> -file <dp_certificate.crt> -keystore <truststore.jks>

  2. When designing your Job in the Studio, connect a tSetKeystore component to the data input component with an OnSubjobOk link in order for the Job to trust the Talend Data Preparation certificate. For more information on how to configure the tSetKeystore, see Talend Components Reference Guide.

For more information on how to create a live dataset, see Talend Help Center (https://help.talend.com).

Configuring an HTTPS connection between Talend Data Preparation, Streams Runner and Spark Job Server

Securing the connections between Talend Data Preparation, Streams Runner and Spark Job Server requires editing their corresponding configuration files.

Any security configuration in the Streams Runner configuration file should be done at the end of the file, in the Append section, after the Include section, to avoid being overwritten.

The first step will be to configure Spark Job Server as a service in HTTPS. Then, you will need to enable SSL communication between Streams Runner and Spark Job Server running in HTTPS. After that, you will configure Streams Runner as a service in HTTPS, and finally, enable SSL communication between Talend Data Preparation and Streams Runner running in HTTPS.

Prerequisites:

  • Talend Data Preparation has been configured as a service in HTTPS. For more information, see Configuring an HTTPS connection for Talend Data Preparation.

  • You have generated a certificate for Talend Data Preparation and added it to your Web browser truststore.

  • Spark Job Server and Streams Runner are installed and running.

  1. To secure the Spark Job Server service in HTTPS, open the <Spark_Job_Server_installation_path>/settings.sh configuration file.

  2. Set the value of the security_ssl_enabled parameter to on.

  3. Edit the values of the security_path_to_keystore and security_keystore_password parameters to set the path and password of your keystore file containing the certificate for Spark Job Server.

    security_ssl_enabled=on
    # DO NOT CHANGE
    # SECURITY PATH TO KEYSTORE
    # Required : No
    # Env variable : SECURITY_PATH_TO_KEYSTORE
    security_path_to_keystore=<path_to_keystore>
    # DO NOT CHANGE
    # SECURITY KEYSTORE PASSWORD
    # Required : No
    # Env variable : SECURITY_KEYSTORE_PASSWORD
    security_keystore_password=<password>

    After restarting the service, Spark Job Server will be running in HTTPS.

  4. To enable SSL communication between Streams Runner and Spark Job Server running in HTTPS, you can either:

    • Use the JOBSERVER_TRANSPORT_PROTOCOL environment variable with the following command: export JOBSERVER_TRANSPORT_PROTOCOL=https.

    • Edit the <Streams_Runner_installation_path>/conf/application.conf configuration file and set the value of the app.svc.jobserver.protocol parameter to https.

    Using the environment variable will override the application.conf configuration.

  5. To add the Spark Job Server certificate, or its Certificate Authority to the Streams Runner truststore, add the following lines to the <Streams_Runner_installation_path>/conf/application.conf file, according to the file format used for your truststore.

    • For .pem files:

        play.ws.ssl {
          trustManager = {
            stores = [
              { type = "PEM", path = "/path/to/pem_file" }
            ]
          }
        }
    • For .jks files:

        play.ws.ssl {
          trustManager = {
            stores = [
              { type="JKS", path="/path/to/truststore", password="<password>"}
            ]
          }
        }

    For more information, see the Play documentation.

  6. To secure the Streams Runner service in HTTPS, define the path and password of its certificate by editing the following lines of the <Streams_Runner_installation_path>/conf/application.conf file:

    play.server.https.keyStore.path = <path_to_keystore>
    play.server.https.keyStore.password = <password>
  7. Edit the two following lines to set the HTTPS port and disable the HTTP port:

    https.port=9443
    http.port=disabled

    9443 is the default port value for the HTTPS connection.

    Warning

    Any play.server.http.port=<port> configuration will conflict with the http.port=disabled configuration and the port will not be disabled.

    After restarting the service, Streams Runner will be running in HTTPS.

  8. To enable SSL communication between Talend Data Preparation and Streams Runner running in HTTPS, retrieve the Streams Runner certificate, or its Certificate Authority, and add it to the Talend Data Preparation truststore using the following command:

    keytool -import -trustcacerts -alias <cert-alias> -file <streams_runner_certificate.crt> -keystore <truststore.jks>

  9. In the <Data_Preparation_Path>/config/application.properties file, add the following properties to set the truststore:

    tls.trust-store=/path/to/<truststore.jks>
    tls.trust-store-password=<trust-store_password>
    
    false to disable hostname verification
    tls.verify-hostname=false
  10. Restart Talend Data Preparation.

Your Talend Data Preparation instance running in HTTPS can now communicate with Streams Runner and Spark Job Server, also running with a secured HTTPS connection.

Configuring an HTTPS connection with Talend Identity and Access Management

Securing the connection between Talend Data Preparation and Talend Identity and Access Management requires editing their corresponding configuration files.

You will first have to configure Talend Identity and Access Management as a service in HTTPS. Then, you will enable SSL communication between Talend Data Preparation and Talend Identity and Access Management running in HTTPS.

Prerequisites:

  • Talend Data Preparation has been configured as a service in HTTPS. For more information, see Configuring an HTTPS connection for Talend Data Preparation.

  • Talend Identity and Access Management has been configured as a service in HTTPS. For more information, see Securing connections for Talend Identity and Access Management.

  • You have generated a certificate for Talend Data Preparation and Talend Identity and Access Management, and added it to your Web browser truststore.

  1. To enable SSL communication between Talend Data Preparation and Talend Identity and Access Management running in HTTPS, retrieve the Talend Identity and Access Management certificate, or its Certificate Authority, and add it to the Talend Data Preparation truststore using the following command:

    keytool -import -trustcacerts -alias <cert-alias> -file <IAM_certificate.crt> -keystore <truststore.jks>

  2. In the <Data_Preparation_Path>/config/application.properties file, add the following properties to set the truststore:

    tls.trust-store=/path/to/<truststore.jks>
    tls.trust-store-password=<trust-store_password>
    
    false to disable hostname verification
    tls.verify-hostname=false
  3. Restart the services.

Your Talend Data Preparation instance running in HTTPS can now communicate with Talend Identity and Access Management, also running with a secured HTTPS connection.

Configuring logs for Talend Data Preparation

Talend Data Preparation logs allows you to analyze and debug the activity of Talend Data Preparation.

Talend Data Preparation logs are located in <Data_Preparation_Path>/data/logs/app.log.

To configure the settings of your log files, edit the <Data_Preparation_Path>/config/log4j2.xml file: