For more technologies supported by Talend, see Talend components.
The tDataprepRun component allows you to reuse an existing preparation made in Talend Data Preparation, directly in a data integration, Spark Batch or Spark Streaming Job. In other words, you can operationalize the process of applying a preparation to input data with the same model.
A good practice when using Talend Data Preparation is to set up at least two environments to work with: a development one, and a production one for example. When a preparation is ready on the development environment, you can use the Import/Export Preparation feature to promote it to the production environment, that has a different URL. For more information, see the section about promoting a preparation across environments.
Following this logic, you will likely find yourself with a preparation that has the same name
on different environments. The thing is that preparations are not actually identified by their
name, but rather by a technical id, such as
prepid=faf4fe3e-3cec-4550-ae0b-f1ce108f83d5. As a consequence, what you
really have is two dinstinct preparations, each with its specific id.
In case you wanted to operationalize this recipe in a Talend Job using the regular preparation selection properties, you would actually need two Jobs: one for the preparation on the development environment, with a specific url and id, and a second one for the production environment, with different parameters.
Through the use the Dynamic preparation selection checkbox and some context variables, you will be able to use a single Job to run your preparation, regardless of the environment. Indeed, the dynamic preparation selection relies on the preparation path in Talend Data Preparation, and not on the preparation id.
You will be able to use a single Job definition to later deploy on your development or production environment
The following scenario creates a simple Job that:
- Receives data from a local CSV file containing customers data
- Dynamically retrieves an existing preparation based on its path and environment
- Applies the preparation on the input data
- Outputs the prepared data into a MySQL database.
In this example, the customers_leads preparation has been created beforehand in Talend Data Preparation. This simple preparation was created on a dataset that has the same schema as the CSV file used as input for this Job, and its purpose is to remove invalid values from your customers data.