Configuring Talend Data Preparation

Talend Real-time Big Data Platform Installation Guide for Linux

EnrichVersion
6.3
EnrichProdName
Talend Real-Time Big Data Platform
task
Installation and Upgrade

This section contains information on the initial configuration actions to be performed after installation, how to secure connections for Talend Data Preparation and how to configure the application logs.

Configuring Talend Data Preparation after installation

When you install Talend Data Preparation, you need to perform certain configuration steps before use.

  1. Open the <Data_Preparation_Path>/config/application.properties file and edit the following Talend Data Preparation properties:

    Note

    All the passwords entered in the properties file are encrypted when you start your Talend Data Preparation instance.

    Field

    Action

    tac.url

    Enter the URL to your Talend Administration Center followed by a /.

    public.ip

    Enter the URL you want to use to access Talend Data Preparation.

    server.port

    Enter the port you want to use for Talend Data Preparation user interface.

    tac.user

    Enter the username of your Data Preparation user in Talend Administration Center.

    tac.password

    Enter the password of your Data Preparation user in Talend Administration Center.

  2. Update the following fields with your MongoDB settings:

    Field

    Description

    mongodb.host

    Host name of your MongoDB instance

    mongodb.port

    Port number of your MongoDB instance

    mongodb.database

    Name of the database on which Talend Data Preparation is connected, dataprep by default. The database is created when you first launch Talend Data Preparation.

    mongodb.user

    Username used to connect to the database

    mongodb.password

    Password used to connect to the database

  3. To enable the interaction between Talend Data Preparation and the Components Catalog service, edit the following line with your Components Catalog server host and port:

    tcomp.server.url=http://<tcomp_host>:<tcomp_port>/tcomp

  4. To configure the access to Talend Dictionary Service, edit the following fields:

    Field

    Description

    spring.cloud.stream.kafka.binder.brokers

    Enter the host corresponding to your Kafka broker.

    spring.cloud.stream.kafka.binder.defaultBrokerPort

    Enter the port corresponding to your Kafka broker.

    spring.cloud.stream.kafka.binder.zkNodes

    Enter the host corresponding to your Zookeeper node.

    spring.cloud.stream.kafka.binder.defaultZkPort

    Enter the port corresponding to your Zookeeper node.

  5. To enable the interaction between Talend Data Preparation and Talend Dictionary Service, set the dataquality.semantic.update.enable property as true.

  6. To enable to use of the Flow Runner with Talend Data Preparation, set the streams.enable property as true.

  7. To configure the access to the Flow Runner, edit the following fields:

    Field

    Description

    streams.flow.runner.url

    Enter the URL to your Flow Runner. The URL is made up of your local machine IP address, and your Big Data Preparation port.

    streams.kerberos.principal

    Enter your Kerberos principal.

    streams.kerberos.keytab_path

    Enter the path to your Kerberos keytab file.

    streams.hdfs.server.url

    You can optionally set a default URL to be displayed in the input and output Path fields, when working with HDFS datasets, in Talend Data Preparation.

    The <Data_Preparation_Path>/config/tuning.properties file contains additional parameters for more advanced tuning. Make sure the parameters in this file match the sizing of your cluster.

  8. Execute the start.sh file to start your Talend Data Preparation instance.

Configuring an HTTPS connection for Talend Data Preparation

To set up an HTTPS secure connection between the different services, as well as with the MongoDB server, you need to edit the application.properties file.

Note that securing the MongoDB connection is not possible if you selected the embedded MongoDB instance during the installation process.

If you want to secure connections with MongoDB using SSL, MongoDB Enterprise Server has to be manually installed on your machine. For more information, see https://docs.mongodb.com/v3.2/security/.

  1. Open the <Data_Preparation_Path>/config/application.properties file.

  2. To define the path and password of the certificate for the Data Preparation server, edit the following lines:

    # server TLS setup
    tls.key-store=/path/to/key-store.jks
    tls.key-store-password=key-store_password
  3. To define the path and password of the signing Certificate Authority (CA) that issued the server certificate, edit the following lines:

    tls.trust-store=/path/to/trust-store.jks
    tls.trust-store-password=trust-store_password
  4. To make the security control more flexible regarding the certificate common name and its URL, edit the following lines:

    # false to disable hostname verification
    tls.verify-hostname=false
  5. To define the path and password of the signing Certificate Authority (CA) that issued the MongoDB server certificate, edit the following lines:

    mongodb.ssl=true
    mongodb.ssl.trust-store=/path/to/trus-store.jks
    mongodb.ssl.trust-store-password=trust-store-password
  6. Change the services URLs from http to https:

    dataset.service.url=https://${public.ip}:${server.port}
    transformation.service.url=https://${public.ip}:${server.port}
    preparation.service.url=https://${public.ip}:${server.port}

Talend Data Preparation only supports the Java Key Store (.jks) format to store keys and certificates.

Configuring Talend Data Preparation when Talend Administration Center is in HTTPS

For Talend Data Preparation to be able to connect to a Talend Administration Center instance running in https, Talend Data Preparation must trust the Talend Administration Center certificate.

  1. Retrieve Talend Administration Center certificate, or its Certificate Authority and add it to an existing or new .jks file following this example:

    keytool -import -trustcacerts -alias <cert-alias> -file <tac_certificate.crt> -keystore <truststore.jks>

  2. In the <Data_Preparation_Path>/config/application.properties file, add the following properties to set the truststore:

    tls.trust-store=/path/to/<truststore.jks>
    tls.trust-store-password=<trust-store_password>
    
    false to disable hostname verification
    tls.verify-hostname=false
  3. Restart Talend Data Preparation.

Using the tDataprepRun component with an HTTPS connection

In order to make the tDataprepRun component work when running Talend Data Preparation with an https connection, complete the following configuration:

  1. Retrieve Talend Data Preparation certificate, or its Certificate Authority and add it to an existing or new .jks file following this example:

    keytool -import -trustcacerts -alias <cert-alias> -file <dp_certificate.crt> -keystore <truststore.jks>

  2. To make the Studio trust the Talend Data Preparation certificate, edit the .ini file used to start the Studio:

    -Djavax.net.ssl.trustStore=/path/to/<trust-store.jks>
    -Djavax.net.ssl.trustStorePassword=<trust-store password>
  3. When designing your Job in the Studio, connect a tSetKeystore component to the data input component with an OnSubjobOk link in order for the Job to trust the Talend Data Preparation certificate. For more information on how to configure the tSetKeystore, see Talend Components Reference Guide.

For more information on how to use the tDataprepRun component and how to operationalize a recipe in a Talend Job, see Talend Help Center (https://help.talend.com).

Creating a live dataset with an HTTPS connection

To create a working live dataset when running Talend Data Preparation with an https connection, complete the following configuration:

  1. Retrieve Talend Data Preparation certificate, or its Certificate Authority and add it to an existing or new .jks file following this example:

    keytool -import -trustcacerts -alias <cert-alias> -file <dp_certificate.crt> -keystore <truststore.jks>

  2. When designing your Job in the Studio, connect a tSetKeystore component to the data input component with an OnSubjobOk link in order for the Job to trust the Talend Data Preparation certificate. For more information on how to configure the tSetKeystore, see Talend Components Reference Guide.

For more information on how to create a live dataset, see Talend Help Center (https://help.talend.com).

Configuring logs for Talend Data Preparation

Talend Data Preparation logs allows you to analyze and debug the activity of Talend Data Preparation.

Talend Data Preparation logs are located in <Data_Preparation_Path>/data/logs/app.log.

To configure the settings of your log files, edit the <Data_Preparation_Path>/config/log4j2.xml file: