Character-based patterns - Cloud

Talend Cloud Data Preparation User Guide

Version
Cloud
Language
English
Product
Talend Cloud
Module
Talend Data Preparation
Content
Administration and Monitoring > Managing connections
Data Quality and Preparation > Cleansing data
Data Quality and Preparation > Managing datasets
Last publication date
2023-09-13
Talend Data Preparation allows you to analyze the character-based patterns repartition in your data.

Latin characters, as well as Asian characters, split between Hiragana, Katakana, Kanji and Hangul, are represented by the following patterns:

Character Pattern
Latin numbers 9 replaces all ASCII digits
Latin lowercase letters a replaces all ASCII Latin characters
Latin uppercase letters A replaces all uppercase Latin characters
Hiragana H replaces all Hiragana characters
Katakana K replaces all Katakana characters
Kanji C replaces Chinese characters
Hangul G replaces Hangul characters
Katakana K replaces all Katakana characters