Skip to main content
Close announcements banner

Supported character types in column analyses and data masking operations

When masking data using Talend Data Preparation or the tDataMasking component, each of the characters in the input data is masked to a character from the same character type, within the supported Unicode ranges.

When creating column analyses in Talend Studio, you can use the East Asia Pattern Frequency or East Asia Pattern Low Frequency indicators for Asian characters, to define the content, structure and quality of your data.

The following table describes the supported character types and the related Unicode ranges (version 11.0).

For more information, see the documentation for the Unicode Standard and the character code charts.

Character Type Unicode Range (version 11.0) Corresponding characters
Latin numbers [0030-0039] [0-9]
Latin lower-cased letters [0061-007A] [00DF-00F6] [00F8-00FF] [a-z] [ß-ö] [ø-ÿ]
Latin upper-cased letters [0041-005A] [00C0-00D6] [00D8-00DE] [A-Z] [À-Ö] [Ø-Þ]
Full-width Latin numbers [FF10-FF19] [0-9]
Full-width Latin lower-cased letters [FF41-FF5A] [a-z]
Full-width Latin upper-cased letters [FF21-FF3A] [A-Z]
Hiragana [3041-3096] 30FC 309D 309E [ぁ-ゖ] ー ゝ ゞ
Half-width Katakana [FF66-FF9D] [ヲ-ン]
Full-width Katakana [30A1-30FA] 30FC 30FD 30FE [ァ-ヺ] ー ヽ ヾ
Phonetic extension: [31F0-31FF] [ㇰ-ㇿ]
Kanji CJK Extension A: [4E00-9FEF] [3400-4DB5] [一-] [㐀-䶵]
CJK Extension B: [20000-2A6D6] [𠀀-𪛖]
CJK Extension C: [2A700-2B734] [𪜀-𫜴]
CJK Extension D: [2B740-2B81D] [𫝀-𫠝]
CJK Extension E: [2B820-2CEA1] [-]
CJK Extension F: [2CEB0-2EBE0] [-]
CJK Compatibility Ideographs: [F900-FA6D] [FA70-FAD9] [豈-舘] [-]
CJK Compatibility Ideographs Supplement: [2F800-2FA1D] [-]
KangXi Radicals: [2F00-2FD5] [⼀-⿕]
CJK Radicals Supplement: [2E80-2E99] [2E9B-2EF3] [⺀-⺙] [⺛-⻳]
CJK Symbols and Punctuation: [3005-3005] [3007-3007] [3021-3029] [3038-303B] [々-々] [〇-〇] [〡-〩] [〸-〻]
Hangul [AC00-D7AF] [가-힯]

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!