tDataprepRun workflow in a Talend Job - 7.0

Data Preparation

author
Talend Documentation Team
EnrichVersion
7.0
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Preparation components
Data Quality and Preparation > Third-party systems > Data Preparation components
Design and Development > Third-party systems > Data Preparation components
EnrichPlatform
Talend Data Preparation
Talend Studio

In Talend Studio, when running a Job using the tDataprepRun component, several elements come into play so that the data prepared in Talend Data Preparation can be retrieved and used in the flow of the Job.

The following diagrams describe the sequence of events that happen at runtime, when the tDataprepRun component is used to retrieve a preparation in a Talend data integration Job, as well as a Big Data Job. In both cases, the first step is for the user to create a Job including the tDataprepRun component.

It is recommended to use the tDataprepRun component with preparation versions so that your Jobs stay valid in time and you can ensure a predictable result, guaranteeing that the same preparation steps will be applied. This can prevent situations were the schema of your preparation has evolved, but not those of the other components, hence breaking the Job.

tDataprepRun in a data integration Job

When running a preparation in the flow of a data integration Job, the preparation is processed in the directly on the Talend Data Preparation server.

tDatapreprun in a Big Data Spark Batch or Spark Streaming Job

When running a preparation in the flow of a Big Data Job, the preparation definition is retrieved from the Talend Data Preparation server and then processed on a Big Data cluster at execution time.