Talend Data Catalog incrementally profiles all tables or
files in the import scope and collects sample rows, when you enable the data profiling
during metadata import.
As data profiling and
metadata import processes share the data store connectivity and scope details, you do
not need to configure the data profiling connectivity explicitly.
Data sampling and data profiling can be defined and performed independently.
Data sampling and data profiling can be performed as part of the model harvesting or on
demand.
Data sampling and profiling is required to perform the auto-tagging for data
classification.
Before you begin
- Make sure that the bridge of the data source supports data profiling.
- You
have been assigned an object role with the Data
Management capability.
Procedure
-
Open the Import Options tab to enable the data profiling
and/or sampling options.
-
Select the Data Profiling check box and define the
number of rows to profile.
-
Select the Data Sampling check box and define the number
of rows for preview.
-
Select the Profile only objects that are not profiled yet
check box to enable data profiling only on imported objects which have not been
profiled.
If the check box is cleared, Talend Data Catalog re-profiles imported
objects based on their last modification time.
-
Select the Data Classification check box to run
automatically the data classification on the newly profiled objects.
-
Select the Hide data using Sensitivity Label check box and
select a sensitivity label from the list to apply the selected sensitivity label to
the new imported objects in the scope.
-
Save your changes.
-
To run or refresh the data profiling and/or sampling, do one of the
following:
- Re-import the model and go to the object page.
- Generate the data profiling and sampling from any level of an imported objects
including Tables/Files/Views (Classifier), Schema/Package, Model or File System
folder.
- Go to the object page.
- In the Data Request SQL area, specify your SQL
query on the object as needed. The Data Request SQL is used after the
re-harvesting.
- In the More actions menu, click
Generate Data Sampling and Profiling.
- Configure the options as needed.
- Click OK to run the operation.