When working on a large dataset in Talend Data Preparation, 50,000 rows for
example, only a sample of the first 10,000 rows is displayed, as you can see in the
dataset parameters:
You can start preparing your data and apply functions, like you would normally do for any
other dataset.
One difference occurs when you apply a filter of any type on your data. Since you are
working on a sample, only the matching rows among the first 10,000 will be retrieved.
But you have the possibility to fetch more matching rows, among the remaining 40,000 and
refine your preparation based on this new sample.
Procedure
-
Click the menu icon on the top left of the grid and select Display
rows with invalid or empty values.
You can see in the filter bar that the filter has been correctly applied and
only the matching rows are displayed in the grid. You can choose any other
filter. Moreover, the option to apply a filter on a category of data, even
if there is no matching value in the sample, is also available for each
individual column. Click the menu icon in the header of a column to display
the available options.
You can also notice the Fetch more button in the
filter bar, showing that you are currently working on a sample, and that
more rows potentially match your filter.
-
Click Fetch more, to retrieve more rows matching your
current filters.
The Fetch additional rows dialog box opens, where you
can see the status of the data retrieval.
Talend Data Preparation
automatically stops when it reaches 10,000 results, or the end of the
dataset. You also have the possibility to stop the process and show the rows
already found. You are then taken back to the grid, where the fetched rows
now form the sample you will be working on. Any filter or function applied
from now on will only apply to this sample.
If the filter you initially chose to apply doesn't match any row, you can
either clear all your filters, or try and search the whole dataset for
matching rows.
-
To go back to your initial sample, clear all your filters.
Click the cross in each individual filter or click the garbage
bin icon to clear the filters.
Results
The grid now displays the first 10,000 rows
of your dataset again and you can continue preparing your data.