About the Talend Trust Score™ with Snowflake - Cloud

Talend Cloud Data Inventory with Snowflake Getting Started Guide

English (United States)
Talend Cloud
Talend Data Inventory
Talend Data Preparation
Talend Pipeline Designer
Data Governance

The following diagram gives you some details on how the Talend Trust Score™ is computed.

  • Talend Cloud Data Inventory is compatible with Snowflake on AWS, GCP and Microsoft Azure.
  • You need some privileges to use Snowflake. See the Snowflake documentation.
  • Using Snowflake with Talend Cloud Data Inventory impacts your Snowflake fees for computing.
  1. When you add a dataset from a Snowflake connection, a copy of the DQ Java libraries and semantic dictionary is sent to Snowflake.

    If you are already using a Snowflake connection, you must update the JDBC URL. For more information, see Editing a connection.

  2. The DQ Java libraries are defined as Java UDFs.
  3. To calculate the Talend Trust Score™ in Snowflake, the following steps occur:
    1. Semantic discovery: the semantic type of each column is determined out of a dataset sample which contains up to 10,000 rows. By default, the sample contains the first rows, called the Head sample. The rows can also be chosen randomly, called the Random sample.
    2. Data validity: the records are checked against the semantic types to determine whether the fields are valid or invalid. If the fields do not match a semantic type, they are checked against the native types. As the semantic discovery, the data validity is done on the sample.
    3. Talend Trust Score™: the validity and completeness are calculated for the whole dataset in Snowflake.
  4. The sample is sent to Talend Cloud Data Inventory and the Talend Trust Score™ is calculated for the whole dataset:
    • The validity and completeness come from Snowflake.
    • The popularity, discoverability and usage are calculated in Talend Cloud Data Inventory. For more information on each axis, see Checking the Talend Trust Score™.

You now have the Talend Trust Score™ with five axes for your dataset.