Profiling Hive - 6.3

Talend Data Fabric Studio User Guide

Talend Data Fabric
Data Quality and Preparation
Design and Development
Talend Studio

Once you create the Hive connection via the connection to the Hadoop distribution as outlined in Creating a connection to Hive, you can analyze the data present in all Hive tables.

Under the Metadata node in the DQ Repository tree view browse to the Hive connection:

  • Right-click the Hive connection and select Overview Analysis.

    This analysis profiles database content to have an overview of the number of tables and rows per table. For further information, see Profiling database content.

  • Right-click a Hive table and select any of the analyses listed in the menu.

    A wizard guides you through the steps to create the selected analysis. You can then assign indicators to the analyzed columns according to your need.

    For further information, see Column analyses, Table analyses and Analyzing duplicates.