When working on a large dataset in Talend Cloud Data Preparation, 50,000 rows for example, only a sample of the
first 10,000 rows is displayed.
You can start preparing your data and apply functions, like you would
normally do for any other dataset. However, one difference occurs when you apply a
filter of any type on your data. Since you are working on a sample, only the matching
rows among the first 10,000 will be retrieved. But you have the possibility to fetch
more matching rows, among the remaining 40,000 and refine your preparation based on this
new sample.
Procedure
-
Click the menu icon on the top left of the grid and select Display
rows with invalid or empty values.
You can see in the filter bar that the filter has been correctly applied and
only the matching rows are displayed in the grid. You can choose any other
filter. Moreover, the option to apply a filter on a category of data, even
if there is no matching value in the sample, is also available for each
individual column. Click the menu icon in the header of a column to display
the available options.
You can also notice the Fetch more button in the
filter bar, showing that you are currently working on a sample, and that
more rows potentially match your filter.
-
Click Fetch more, to retrieve more rows matching your
current filters.
The Fetch additional rows dialog box opens, where you
can see the status of the data retrieval.
Talend Cloud Data Preparation automatically
stops when it reaches 10,000 results, or the end of the dataset. You also
have the possibility to stop the process and show the rows already found.
You are then taken back to the grid, where the fetched rows now form the
sample you will be working on. Any filter or function applied from now on
will only apply to this sample.
If the filter you initially chose to apply doesn't match any row, you can
either clear all your filters, or try and search the whole dataset for
matching rows.
-
To go back to your initial sample, clear all your filters.
Click the cross in each individual filter or click the garbage
bin icon to clear the filters.
Results
The grid now displays the first 10,000 rows
of your dataset again and you can continue preparing your data.