Working on large datasets

Talend Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
6.3
2.0
EnrichProdName
Talend Data Integration
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend ESB
Talend Data Services Platform
Talend Data Management Platform
Talend MDM Platform
Talend Big Data
Talend Big Data Platform
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation
By default, a dataset that exceeds 10,000 rows for Talend Data Preparation, and 30,000 rows for Talend Data Preparation Free Desktop is considered a large dataset.

Even if there is no limitation regarding the size of the dataset that you can import, the export settings and the display of large datasets are different than usual. Let's take the example of a dataset containing 50,000 rows:

  • In Talend Data Preparation Free Desktop, the import will be cut at 30,000 rows. You can only prepare and export the first 30,000 rows of your dataset. This is a default value that can be set to a lower value by editing the dataset.records.limit parameter in the application.properties file, located in the installation folder.

  • In Talend Data Preparation, you will be able to work on a sample displaying the first 10,000 rows. This is a default value. You can set it to a higher value by editing the dataset.records.limit parameter in the application.properties file, located in the installation folder. A higher maximum value might decrease the application performances. The maximum value that you can set depends on your Web browser, your network quality, and the power of your machine. Do not exceed 100,000 rows as maximum value for your sample.

If you change the default value for the number of rows to be displayed, it will only apply to datasets that are imported from this point onwards, and not to existing datasets.