The semantic type corresponds to the category (names, emails, phone numbers, etc.) of the data. If the semantic type that has been detected for a column is not the desired one, you have the possibility to manually change it to one of the predefined types, based on your experience.
In the case of the movies_gsg dataset, you
can see by looking at the sample that most columns have been assigned a type that
corresponds to the actual data, like
String for titles,
Date for release dates, or
Language code iso2 for the original language for example. However, you
will notice that popularity column is marked as
geographical coordinates which is not correct in
this specific context. The way the data is formatted does match how coordinates can be
written, but you will update it so that the type is more inline with the actual content
of the column.
Click the menu icon in the header of the
The menu that opens lists the top matching types,
geographical coordinatesin the case, as well as the more standard types such as
Boolean. The geographical coordinate type has been automatically assigned because of the 99% compatibility, the missing 1% resulting in the only invalid value of the column.
From the list of available types, select
Repeat these last steps to change the type of the runtime column to the more suited
Decimaltype matches 100% of the data in the popularity column, the quality bar in the column header is not showing orange anymore.