Available in...Big Data Platform
Cloud API Services Platform
Cloud Big Data Platform
Cloud Data Fabric
Cloud Data Management Platform
Data Fabric
Data Management Platform
Data Services Platform
MDM Platform
Real-Time Big Data Platform
After creating a connection to an ADLS Databricks cluster via Hive, you can create a
profiling analysis on a specific file.
Procedure
-
In the DQ Repository tree
view, expand the JDBC connection.
-
In the Columns folder, select the columns
you want to analyze and right-click them.
Tip: To create an analysis on all columns, right-click the table
name.
-
Hover over Column Analysis and select
the analysis type you need.
The Create New
Analysis wizard is displayed.
-
Enter a name and click Finish. The other fields are
optional.
A new analysis on the selected ADLS files is
automatically created and opened in the analysis editor. Depending on the
analysis type you have selected, the indicators are automatically assigned for
columns.
The analysis applies to the Hive table, but
computes statistics on the data from the ADLS by using the External table mechanism. External tables keep data in the original
file outside of Hive. If the ADLS file you selected to analyze is deleted,
the analysis will not be able to run anymore.
-
If needed:
- Modify the columns to be analyzed: In the Data Preview
tab, click Select Columns.
- Add more indicators or new patterns to the columns: In the Analyzed
Columns tab, click Select
Indicators.
-
Run the analysis to display the results in the Analysis
Results view in the editor.
What to do next
You can create a report on this analysis. See
Creating a report on specific
analyses.