List of the indexes and regex categories used in the Semantic-aware analysis - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-13
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

The Semantic-aware approach analyzes column content based on a set of methods: regex, data dictionary and keyword dictionary.

The dictionary indexes and regex categories are embedded in Talend Studio and used in the Semantic-aware analysis to:
  • Help exploring semantic categories of data; and
  • Decide what category the data falls in.

If you do not use the latest version of Talend Studio, some of the listed regex categories and data dictionary indexes might not be available.

Regex categories

Regex categories Description Origin of data
Amex Card American Express card Talend
AT VAT Number Austrian VAT number Talend
Bank Routing Transit Number Bank routing transit number Talend
BE Postal Code Belgian postal code Talend
BG VAT Number Bulgarian VAT number Talend
Color Hex Code Color hexadecimal code Talend
Data URL URL starting with the word data Talend
DE Phone German phone number Talend
DE Postal Code German postal code Talend
EN Month Month in English Talend
EN Month Abbrev English month abbreviation Talend
EN Weekday Week day or their abbreviation Talend
Email Email address Talend
File URL File URL Talend
FR Insee Code French Insee code of cities with Corsica and colonies Insee
FR Phone French phone number Talend
FR Postal Code French postal code Talend
FR Social Security Number French social security number Talend
FR VAT Number French VAT number Talend
Geographic Coordinate Geographic coordinate, longitude, and latitude coordinates with at least meter precision Talend
Geographic Coordinates Geographic coordinates, Google Maps style GPS Decimal format Talend
Geographic Coordinates (degree) Geographic coordinates (degrees), Latitude, and longitude coordinates separated by a comma in the form: N 0:59:59.99,E 0:59:59.99 Talend
HDFS URL HDFS URL Talend
IBAN International Bank Account Number Talend
IPv4 Address IPv4 address Talend
IPv6 Address IPv6 address Talend
ISBN-10 International standard book number 10 digits Talend
ISBN-13 International standard book number 13 digits Talend
MAC Address MAC address Talend
MailTo URL MailTo URL Talend
MasterCard Mastercard credit card Talend
Money Amount (EN) Amount of money in English format Talend
Money Amount (FR) Amount of money in French format Talend
Passport Passport number Talend
SE Social Security Number Swedish person number Talend
SEDOL Stock exchange daily official list Talend
UK Phone UK phone number Talend
UK Postal Code UK postal code Talend
UK Social Security Number National identification number, national identity number, or national insurance number generally called NI number Talend
URL Web site URL Talend
US Phone US phone number Talend
US Postal Code US postal code Talend
US Social Security Number US social security number Talend
US State US states Talend
US State Code US state code Talend
Visa Card Visa credit card Talend
Web Domain Web site domain Talend

Data dictionary indexes

Data dictionary indexes Description Origin of data
Airport Airport Talend
Airport Code Airport code Talend
Animal Animal Talend
Answer Answers with the value True or False Talend
Beverage Type of beverage YAGO
CA Province Territory Canadian province Stadoids
CA Province Territory Code Canadian province code Stadoids
City City name Talend
Civility Civility Talend
Company Company name YAGO
Continent Continent name Talend
Continent Code Continent code Talend
Country Country name Open Knowledge (Public Domain Dedication and License)
Country Code ISO2 2-letter country code Open Knowledge (Public Domain Dedication and License)
Country Code ISO3 3-letter country code Open Knowledge (Public Domain Dedication and License)
Currency Code Currency code Open Knowledge (Public Domain Dedication and License)
Currency Name Currency name Open Knowledge (Public Domain Dedication and License)
FR Commune French municipality Insee
FR Departement French department Insee
FR Region French region Insee
FR Region Legacy Former French regions, prior to the 2016 territorial reform. Insee
Gender Gender Talend
HR Department HR department Talend
Industry Industry name Talend
Industry Group Industry group Talend
Job Title Job title Talend
Language Language Wikipedia
Language Code ISO2 2-letter language code Wikipedia
Language Code ISO3 3-letter language code Wikipedia
Last Name Last name United States Census Bureau
Measure Unit Measure unit Talend
Month Month Talend
Museum Museum name YAGO
MX Estado Mexican state Stadoids
MX Estado Code Mexican state code Stadoids
Organization Organization YAGO
Sector Sector Talend
Street Type Street type Talend
US County US county name Wikipedia
US State US states Talend
US State Code US state code Talend
Weekday Day of the week Talend

Keyword dictionary indexes

Keyword dictionary indexes Description Origin of data
Address Line Street number and name Talend
Full Name Full name Talend