Fetching more data from a large dataset - 7.3

Talend Data Preparation User Guide

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2023-11-28

When working on a large dataset in Talend Data Preparation, 50,000 rows for example, only a sample of the first 10,000 rows is displayed, as you can see in the dataset parameters:

You can start preparing your data and apply functions, like you would normally do for any other dataset.

One difference occurs when you apply a filter of any type on your data. Since you are working on a sample, only the matching rows among the first 10,000 will be retrieved. But you have the possibility to fetch more matching rows, among the remaining 40,000 and refine your preparation based on this new sample.

Procedure

  1. Click the menu icon on the top left of the grid and select Display rows with invalid or empty values.

    You can see in the filter bar that the filter has been correctly applied and only the matching rows are displayed in the grid. You can choose any other filter. Moreover, the option to apply a filter on a category of data, even if there is no matching value in the sample, is also available for each individual column. Click the menu icon in the header of a column to display the available options.

    You can also notice the Fetch more button in the filter bar, showing that you are currently working on a sample, and that more rows potentially match your filter.

  2. Click Fetch more, to retrieve more rows matching your current filters.

    The Fetch additional rows dialog box opens, where you can see the status of the data retrieval.

    Talend Data Preparation automatically stops when it reaches 10,000 results, or the end of the dataset. You also have the possibility to stop the process and show the rows already found. You are then taken back to the grid, where the fetched rows now form the sample you will be working on. Any filter or function applied from now on will only apply to this sample.

    If the filter you initially chose to apply doesn't match any row, you can either clear all your filters, or try and search the whole dataset for matching rows.

  3. To go back to your initial sample, clear all your filters.
    Click the cross in each individual filter or click the garbage bin icon to clear the filters.

Results

The grid now displays the first 10,000 rows of your dataset again and you can continue preparing your data.