Changing the semantic type of the popularity column - Cloud

Talend Cloud Data Inventory Getting Started Guide

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend Data Inventory
task
Administration and Monitoring > Managing connections
Data Governance
Data Quality and Preparation > Enriching data
Data Quality and Preparation > Identifying data

The semantic type corresponds to the category (names, emails, phone numbers, etc.) of the data. If the semantic type that has been detected for a column is not the desired one, you have the possibility to manually change it to one of the predefined types, based on your experience.

In the case of the movies_gsg dataset, you can see by looking at the sample that most columns have been assigned a type that corresponds to the actual data, like String for titles, Date for release dates, or Language code iso2 for the original language for example. However, you will notice that popularity column is marked as geographical coordinates which is not correct in this specific context. The way the data is formatted does match how coordinates can be written, but you will update it so that the type is more inline with the actual content of the column.

Procedure

  1. Click the menu icon in the header of the popularity column.
    The menu that opens lists the top matching types, geographical coordinates in the case, as well as the more standard types such as Text, Integer, Decimal, or Boolean. The geographical coordinate type has been automatically assigned because of the 99% compatibility, the missing 1% resulting in the only invalid value of the column.
  2. From the list of available types, select Decimal.
  3. Repeat these last steps to change the type of the runtime column to the more suited Decimal as well.

Results

You have changed the semantic type of the popularity and runtime columns. And because the Decimal type matches 100% of the data in the popularity column, the quality bar in the column header is not showing orange anymore.