Supported character types in column analyses and data masking operations - 7.1

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Data Governance > Third-party systems > Data Quality components > Data privacy components
Data Quality and Preparation > Profiling data
Data Quality and Preparation > Third-party systems > Data Quality components > Data privacy components
Design and Development > Third-party systems > Data Quality components > Data privacy components
EnrichPlatform
Talend Studio

Supported character types in column analyses and data masking operations

When masking data using Talend Data Preparation or the tDataMasking component, each of the characters in the input data is masked to a character from the same character type, within the supported Unicode ranges.

When creating column analyses in Talend Studio, you can use the East Asia Pattern Frequency or East Asia Pattern Low Frequency indicators for Asian characters, to define the content, structure and quality of your data.

The following table describes the supported character types and the related Unicode ranges (version 11.0).

For more information, see the documentation for the Unicode Standard (http://unicode.org/standard/standard.html) and the character code charts (http://www.unicode.org/charts/).

Character Type Unicode Range (version 11.0) Corresponding characters
Latin numbers [0030-0039] [0-9]
Latin lower-cased letters [0061-007A] [00DF-00F6] [00F8-00FF] [a-z] [ß-ö] [ø-ÿ]
Latin upper-cased letters [0041-005A] [00C0-00D6] [00D8-00DE] [A-Z] [À-Ö] [Ø-Þ]
Full-width Latin numbers [FF10-FF19] [0-9]
Full-width Latin lower-cased letters [FF41-FF5A] [a-z]
Full-width Latin upper-cased letters [FF21-FF3A] [A-Z]
Hiragana [3041-3096] 30FC 309D 309E [ぁ-ゖ] ー ゝ ゞ
Half-width Katakana [FF66-FF9D] [ヲ-ン]
Full-width Katakana [30A1-30FA] 30FC 30FD 30FE [ァ-ヺ] ー ヽ ヾ
Phonetic extension: [31F0-31FF] [ㇰ-ㇿ]
Kanji CJK Extension A: [4E00-9FEF] [3400-4DB5] [一-] [㐀-䶵]
CJK Extension B: [20000-2A6D6] [𠀀-𪛖]
CJK Extension C: [2A700-2B734] [𪜀-𫜴]
CJK Extension D: [2B740-2B81D] [𫝀-𫠝]
CJK Extension E: [2B820-2CEA1] [-]
CJK Extension F: [2CEB0-2EBE0] [-]
CJK Compatibility Ideographs: [F900-FA6D] [FA70-FAD9] [豈-舘] [-]
CJK Compatibility Ideographs Supplement: [2F800-2FA1D] [-]
KangXi Radicals: [2F00-2FD5] [⼀-⿕]
CJK Radicals Supplement: [2E80-2E99] [2E9B-2EF3] [⺀-⺙] [⺛-⻳]
CJK Symbols and Punctuation: [3005-3005] [3007-3007] [3021-3029] [3038-303B] [々-々] [〇-〇] [〡-〩] [〸-〻]
Hangul [AC00-D7AF] [가-힯]