Operationalizing a recipe in a Talend Job

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
6.4
2.1
EnrichProdName
Talend Real-Time Big Data Platform
Talend Big Data Platform
Talend MDM Platform
Talend Data Fabric
Talend ESB
Talend Big Data
Talend Data Services Platform
Talend Data Integration
Talend Data Management Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation

It is possible to use a preparation as part of a data integration flow in Talend Studio.

The tDataprepRun component allows you to reuse an existing preparation made in Talend Data Preparation, directly in a data integration Job. In other words, you can operationalize the process of applying a preparation to input files that have the same model.

This example shows a Job design that applies a preparation on a Salesforce input, and outputs it to a Redshift database. This assumes that a preparation has been created beforehand, on a dataset with the same schema as your input file for the Job. In this case, the existing preparation is called datapreprun_preparation.

The tDataprepRun component is an intermediary step and requires an input and an output flow. You can use any type of input and output flow, but a basic working Job design would look like the following:

Before you begin

In order to make the tDataprepRun component work when running Talend Data Preparation with an https connection, complete the following configuration:

  • Retrieve Talend Data Preparation certificate, or its Certificate Authority and add it to an existing or new .jks file following this example: keytool -import -trustcacerts -alias <cert-alias> -file <dp_certificate.crt> -keystore <truststore.jks>
  • To make the Studio trust the Talend Data Preparation certificate, edit the .ini file used to start the Studio:
    -Djavax.net.ssl.trustStore=/path/to/<trust-store.jks>
    -Djavax.net.ssl.trustStorePassword=<trust-store password>
  • Connect a tSetKeystore component to tSalesforceInput with an OnSubjobOk link in order for the Job to trust the Talend Data Preparation certificate.

    For more information on how to configure the tSetKeystore, see the tSetKeystore documentation.

Procedure

  1. In the design workspace of Talend Studio, add a tSalesforceInput, a tDataprepRun, a tRedshiftOutput, and link them together using two Row > Main links.
  2. Select the tSalesforceInput component and click the Component tab to define its basic settings.

    Make sure that the schema of the tSalesforceInput component matches the schema expected by the tDataprepRun component. In other words, the input schema must be the same as the dataset upon which the preparation was made in the first place.

  3. Select the tDataprepRun component and click the Component tab to define its basic settings.
  4. Enter your Talend Data Preparation connection information.
  5. Click Choose an existing preparation to display a list of the preparations available in Talend Data Preparation.
  6. Select the checkbox in front of the preparation you want to apply and click OK.
  7. Click Fetch Schema to retrieve the schema of the preparation, datapreprun_preparation in this case.

    The output schema of the tDataprepRun component now reflects the changes made with each preparation step. The schema takes into account columns that were added or removed for example.

  8. Select the tRedshiftOutput component and click the Component tab to define its basic settings.
  9. Click Sync columns to retrieve the new output schema, inherited from the tDataprepRun component.
  10. Save your Job and press F6 to run it.

Results

All the preparation steps of datapreprun_preparation have been applied to your data, directly in the flow of your data integration Job.