The semantic type corresponds to the category (names, emails, phone numbers, etc.) of the data. If the semantic type that has been detected for a column is not the desired one, you have the possibility to manually change it to one of the predefined types, based on your experience.
In the case of the movies_gsg dataset, you
can see by looking at the sample that most columns have been assigned a type that
corresponds to the actual data, like String
for titles,
Date
for release dates, or Language code iso2
for the original language for example. However, you
will notice that popularity column is marked as
geographical coordinates
which is not correct in
this specific context. The way the data is formatted does match how coordinates can be
written, but you will update it so that the type is more inline with the actual content
of the column.
Procedure
Results
Decimal
type matches 100% of the data in the popularity column, the quality bar in the column header is not showing
orange anymore.