Supported character types in column analyses and data masking operations - Cloud - 8.0

Talend Studio User Guide

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-29
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

When masking data using Talend Data Preparation or the tDataMasking component, each of the characters in the input data is masked to a character from the same character type, within the supported Unicode ranges.

When creating column analyses in Talend Studio, you can use the East Asia Pattern Frequency or East Asia Pattern Low Frequency indicators for Asian characters, to define the content, structure, and quality of your data.

The following table describes the supported character types and the related Unicode ranges (version 11.0).

For more information, see the documentation for the Unicode Standard and the character code charts.

Character Type Unicode Range (version 11.0) Corresponding characters
Latin numbers [0030-0039] [0-9]
Latin lower-cased letters [0061-007A] [00DF-00F6] [00F8-00FF] [a-z] [ß-ö] [ø-ÿ]
Latin upper-cased letters [0041-005A] [00C0-00D6] [00D8-00DE] [A-Z] [À-Ö] [Ø-Þ]
Full-width Latin numbers [FF10-FF19] [0-9]
Full-width Latin lower-cased letters [FF41-FF5A] [a-z]
Full-width Latin upper-cased letters [FF21-FF3A] [A-Z]
Hiragana [3041-3096] 30FC 309D 309E [ぁ-ゖ] ー ゝ ゞ
Half-width Katakana [FF66-FF9D] [ヲ-ン]
Full-width Katakana [30A1-30FA] 30FC 30FD 30FE [ァ-ヺ] ー ヽ ヾ
Full-width Katakana Phonetic extension: [31F0-31FF] [ㇰ-ㇿ]
Kanji CJK Extension A: [4E00-9FEF] [3400-4DB5] [一-鿯] [㐀-䶵]
Kanji CJK Extension B: [20000-2A6D6] [𠀀-𪛖]
Kanji CJK Extension C: [2A700-2B734] [𪜀-𫜴]
Kanji CJK Extension D: [2B740-2B81D] [𫝀-𫠝]
Kanji CJK Extension E: [2B820-2CEA1] [𫠠-𬺡]
Kanji CJK Extension F: [2CEB0-2EBE0] [𬺰-𮯠]
Kanji CJK Compatibility Ideographs: [F900-FA6D] [FA70-FAD9] [豈-舘] [並 -龎]
Kanji CJK Compatibility Ideographs Supplement: [2F800-2FA1D] [丽-𪘀]
Kanji KangXi Radicals: [2F00-2FD5] [⼀-⿕]
Kanji CJK Radicals Supplement: [2E80-2E99] [2E9B-2EF3] [⺀-⺙] [⺛-⻳]
Kanji CJK Symbols and Punctuation: [3005-3005] [3007-3007] [3021-3029] [3038-303B] [々-々] [〇-〇] [〡-〩] [〸-〻]
Hangul [AC00-D7AF] [가-힯]