Promoting a Job leveraging a preparation across environments - Cloud - 8.0

Data Preparation

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Talend Studio
Content
Data Governance > Third-party systems > Data Preparation components
Data Quality and Preparation > Third-party systems > Data Preparation components
Design and Development > Third-party systems > Data Preparation components
Last publication date
2024-02-20

This scenario applies only to subscription-based Talend products.

For more technologies supported by Talend, see Talend components.

The tDataprepRun component allows you to reuse an existing preparation made in Talend Data Preparation, directly in a data integration, Spark Batch or Spark Streaming Job. In other words, you can operationalize the process of applying a preparation to input data with the same model.

A good practice when using Talend Data Preparation is to set up at least two environments to work with: a development one, and a production one for example. When a preparation is ready on the development environment, you can use the Import/Export Preparation feature to promote it to the production environment, that has a different URL. For more information, see the section about promoting a preparation across environments.

Following this logic, you will likely find yourself with a preparation that has the same name on different environments. The thing is that preparations are not actually identified by their name, but rather by a technical id, such as prepid=faf4fe3e-3cec-4550-ae0b-f1ce108f83d5. As a consequence, what you really have is two dinstinct preparations, each with its specific id.

In case you wanted to operationalize this recipe in a Talend Job using the regular preparation selection properties, you would actually need two Jobs: one for the preparation on the development environment, with a specific url and id, and a second one for the production environment, with different parameters.

Through the use the Dynamic preparation selection checkbox and some context variables, you will be able to use a single Job to run your preparation, regardless of the environment. Indeed, the dynamic preparation selection relies on the preparation path in Talend Data Preparation, and not on the preparation id.

You will be able to use a single Job definition to later deploy on your development or production environment

The following scenario creates a simple Job that:

  • Receives data from a local CSV file containing customers data
  • Dynamically retrieves an existing preparation based on its path and environment
  • Applies the preparation on the input data
  • Outputs the prepared data into a MySQL database.

In this example, the Customers_leads preparation has been created beforehand in Talend Data Preparation. This simple preparation was created on a dataset that has the same schema as the CSV file used as input for this Job, and its purpose is to remove invalid values from your customers data.