Working on large datasets - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26
By default, a dataset that exceeds 10,000 rows in Talend Data Preparation is considered a large dataset.

Even if there is no limitation regarding the size of the dataset that you can import, the export settings and the display of large datasets are different than usual. You will be able to work on a sample displaying the first 10,000 rows, but your preparation can also be applied to the rest of your dataset. The following scenario will illustrate the example of a dataset containing 50,000 rows.

The 10,000 row limitation is a default value. You can set it to a higher value by editing the dataset.records.limit parameter in the application.properties file, located in the installation folder. A higher maximum value might decrease the application performances. The maximum value that you can set depends on your Web browser, your network quality, and the power of your machine. Do not exceed 100,000 rows as maximum value for your sample.

If you change the default value for the number of rows to be displayed, it will only apply to datasets that are imported from this point onwards, and not to existing datasets.