Analyzing the heat map - Cloud - 8.0

Data matching with Talend tools

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Data matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Governance > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Continuous matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Data matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Data Quality and Preparation > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Design and Development > Third-party systems > Data Quality components > Matching components > Continuous matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Data matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Fuzzy matching components
Design and Development > Third-party systems > Data Quality components > Matching components > Matching with machine learning components
Last publication date
2024-02-06

The heat map helps you quickly see the importance of a feature and matching key in a model.

In the following examples, you will see how to analyze the heat map and, depending on the minimum model quality you want to obtain, how you can decide if a feature is necessary to the model.

In those examples, we use a database of childcare centers that contains the following input data:
  • The site name,
  • The address and
  • The source of the previous data.

The database and the settings remained the same; only the matching keys changed.

First example: the site name is the matching key

In this example, one input data is set as a matching key.

The model quality is: 0.802. It is high, but not enough to have a reliable model.

In the following examples, more matching keys are set to see their impact on the model quality.

Second example: the address and site name are the matching keys

In this example, one matching key is added to the previous example.

The model quality is: 0.917. It is significantly higher than the previous example.

Adding a matching key helps you see that no features from the Site name input data are important.

Third example: the address, site name and source are the matching keys

In this example, one matching key is added to the previous example.

The model quality is: 0.925.

You can see that no features from the Source input data are important. If you compare this example to the previous one, the model quality is higher but not enough to make this matching key essential to the model.

Summary

The following table summarizes the matching keys and the model quality of the preceding examples.
Example Matching keys Model quality
1 Site name 0.802
2 Address and Site name 0.917
3 Address, Site name and Source 0.925

After setting different matching keys and running several Jobs, you can see that some features are not important to the model.

Even if a model quality is satisfying, you can add or remove matching keys to compare the results.

Depending on your database, a less important feature can be noise in the model.

Depending on the minimum model quality you want to obtain, you can decide if a matching key is necessary to the model.