Character-based patterns - Cloud

Talend Cloud Data Preparation User Guide

author
Talend Documentation Team
EnrichVersion
Cloud
EnrichProdName
Talend Cloud
task
Data Quality and Preparation > Cleansing data
EnrichPlatform
Talend Data Preparation
Talend Data Preparation allows you to analyze the character-based patterns repartition in your data.

Latin characters, as well as Asian characters, split between Hiragana, Katakana, Kanji and Hangul, are represented by the following patterns:

Character Pattern
Latin numbers 9 replaces all ASCII digits
Latin lowercase letters a replaces all ASCII Latin characters
Latin uppercase letters A replaces all uppercase Latin characters
Hiragana H replaces all Hiragana characters
Katakana K replaces all Katakana characters
Kanji C replaces Chinese characters
Hangul G replaces Hangul characters
Katakana K replaces all Katakana characters