Defining the dataset sample size - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2023-09-14

To ensure optimal performances, Talend Data Preparation limits the number of rows that are displayed in the grid at 10,000 rows by default.

This means that even if you import a 50,000 rows dataset for example, only a sample of the first 10,000 rows will be displayed in the application. This limit applies to all dataset types. However, this value is not hard-coded and can be modified by editing the Talend Data Preparation configuration file.

Procedure

  1. To change the maximum number of rows that can be displayed for your datasets, open the <Data_Preparation_Path>/config/application.properties file.
  2. Change the value of the dataset.records.limit parameter to the desired one.
    The default value is 10000, so you can modify the property to dataset.records.limit=30000 for example.
  3. Save the file and restart your Talend Data Preparation instance.

Results

From now on, when opening a dataset in Talend Data Preparation, a sample of a maximm of 30,000 rows will be displayed on the grid.

Datasets that were cached before the configuration file update will keep their previous setting. For this reason, it is recommended to empty your cache after this operation.