Supported character types in column analyses and data masking operations - 7.3

Talend Open Studio User Guide

Version
7.3
Language
English
Product
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Module
Talend Studio
Content
Design and Development
Last publication date
2023-10-11
Available in...

Open Studio for Data Quality

When masking data using Talend Data Preparation or the tDataMasking component, each of the characters in the input data is masked to a character from the same character type, within the supported Unicode ranges.

When creating column analyses in Talend Studio, you can use the East Asia Pattern Frequency or East Asia Pattern Low Frequency indicators for Asian characters, to define the content, structure and quality of your data.

The following table describes the supported character types and the related Unicode ranges (version 11.0).

For more information, see the documentation for the Unicode Standard (http://unicode.org/standard/standard.html) and the character code charts (http://www.unicode.org/charts/).

Character Type Unicode Range (version 11.0) Corresponding characters
Latin numbers [0030-0039] [0-9]
Latin lower-cased letters [0061-007A] [00DF-00F6] [00F8-00FF] [a-z] [ß-ö] [ø-ÿ]
Latin upper-cased letters [0041-005A] [00C0-00D6] [00D8-00DE] [A-Z] [À-Ö] [Ø-Þ]
Full-width Latin numbers [FF10-FF19] [0-9]
Full-width Latin lower-cased letters [FF41-FF5A] [a-z]
Full-width Latin upper-cased letters [FF21-FF3A] [A-Z]
Hiragana [3041-3096] 30FC 309D 309E [ぁ-ゖ] ー ゝ ゞ
Half-width Katakana [FF66-FF9D] [ヲ-ン]
Full-width Katakana [30A1-30FA] 30FC 30FD 30FE [ァ-ヺ] ー ヽ ヾ
Phonetic extension: [31F0-31FF] [ㇰ-ㇿ]
Kanji CJK Extension A: [4E00-9FEF] [3400-4DB5] [一-] [㐀-䶵]
CJK Extension B: [20000-2A6D6] [𠀀-𪛖]
CJK Extension C: [2A700-2B734] [𪜀-𫜴]
CJK Extension D: [2B740-2B81D] [𫝀-𫠝]
CJK Extension E: [2B820-2CEA1] [-]
CJK Extension F: [2CEB0-2EBE0] [-]
CJK Compatibility Ideographs: [F900-FA6D] [FA70-FAD9] [豈-舘] [-]
CJK Compatibility Ideographs Supplement: [2F800-2FA1D] [-]
KangXi Radicals: [2F00-2FD5] [⼀-⿕]
CJK Radicals Supplement: [2E80-2E99] [2E9B-2EF3] [⺀-⺙] [⺛-⻳]
CJK Symbols and Punctuation: [3005-3005] [3007-3007] [3021-3029] [3038-303B] [々-々] [〇-〇] [〡-〩] [〸-〻]
Hangul [AC00-D7AF] [가-힯]