tDataprepRun - Cloud - 8.0

Data Preparation

Version
Cloud
8.0
Language
English (United States)
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Talend Studio
Content
Data Governance > Third-party systems > Data Preparation components
Data Quality and Preparation > Third-party systems > Data Preparation components
Design and Development > Third-party systems > Data Preparation components

Applies a preparation made using Talend Data Preparation in a standard Data Integration Job.

tDataprepRun fetches a preparation made using Talend Data Preparation and applies it to a set of data.

Note: This component is not shipped with your Talend Studio by default. You need to install the Talend Data Preparation components in the Data Integration > Components section of the Feature Manager before you can use it in your Talend Studio. For more information, see Installing features using the Feature Manager.

For more technologies supported by Talend, see Talend components.

Depending on the Talend product you are using, this component can be used in one, some or all of the following Job frameworks:

  • Standard: see tDataprepRun Standard properties.

    Note: For reference, tDataprepRun can process datasets of up to 10 million rows and 100 columns (7GB) at a speed of around 200 rows per second (150kB/s) for a 60-step preparation (these figures are indicative and may vary). For better performance or datasets beyond 10 million rows, consider using Spark Jobs.

    The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, and in Talend Data Fabric.

  • Spark Batch: see tDataprepRun properties for Apache Spark Batch.

    The component in this framework is available in all subscription-based Talend products with Big Data and Talend Data Fabric.

  • Spark Streaming: see tDataprepRun properties for Apache Spark Streaming.

    This component is available in Talend Real Time Big Data Platform and Talend Data Fabric.