Profiling Hive - 6.1

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Once you create the Hive connection via the connection to the Hadoop distribution as outlined in Creating a connection to Hive, you can analyze the data present in all Hive tables.

Under the Metadata node in the DQ Repository tree view browse to the Hive connection:

  • Right-click the Hive connection and select Overview Analysis.

    This analysis profiles database content to have an overview of the number of tables and rows per table. For further information, see Profiling database content.

  • Right-click a Hive table and select any of the analyses listed in the menu.

    A wizard guides you through the steps to create the selected analysis. For further information, see Column analyses, Table analyses and Analyzing duplicates respectively.