Character-based patterns - 8.0

Talend Data Preparation User Guide

English (United States)
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Data Preparation
Data Quality and Preparation > Cleansing data
Talend Data Preparation allows you to analyze the character-based patterns repartition in your data.

Latin characters, as well as Asian characters, split between Hiragana, Katakana, Kanji and Hangul, are represented by the following patterns:

Character Pattern
Latin numbers 9 replaces all ASCII digits
Latin lowercase letters a replaces all ASCII Latin characters
Latin uppercase letters A replaces all uppercase Latin characters
Hiragana H replaces all Hiragana characters
Katakana K replaces all Katakana characters
Kanji C replaces Chinese characters
Hangul G replaces Hangul characters
Katakana K replaces all Katakana characters