Word-based patterns - Cloud

Talend Cloud Data Stewardship User Guide

Talend Cloud
Talend Data Stewardship
Administration and Monitoring > Managing users
Data Governance > Assigning tasks
Data Governance > Managing campaigns
Data Governance > Managing data models
Data Quality and Preparation > Handling tasks
Last publication date
Talend Cloud Data Stewardship conducts a word-based pattern profiling and computes the word patterns of the data you load in any of the campaigns. You can then use these patterns to filter tasks according to the content and structure of the data before assigning or resolving the tasks.

Word patterns are case sensitive and are computed only for non numeric fields such as text, boolean and semantic types. The following table lists the word patterns and their description.

Pattern Description
[Word] Word starting with an uppercase character and consisting of lowercase characters
[WORD] Word with uppercase characters
[word] Word with lowercase characters
[Char] Single uppercase character
[char] Single lowercase character
[Ideogram] One of the CJK Unified Ideographs
[IdeogramSeq] Sequence of ideograms
[hiraSeq] Sequence of Japanese Hiragana characters
[kataSeq] Sequence of Japanese Katakana characters
[hangulSeq] Sequence of Korean Hangul characters
[digit] One of the Arabic numerals: 0,1,2,3,4,5,6,7,8,9
[number] Sequence of digits

The following examples illustrate how certain records would be interpreted in Talend Cloud Data Stewardship.

String Pattern
A character is NOT a Word [Char] [word] [word] [WORD] [char] [Word]
someWordsINwORDS [word][Word][WORD][char][WORD]
Example123@domain.com [Word][number]@[word].[word]
anotherExample8@domain.com [word][Word][digit]@[word].[word]
袁 花木蘭88 [Ideogram] [IdeogramSeq][number]
Latin2中文 [Word][digit][IdeogramSeq]
Latin3フランス [Word][digit][kataSeq]
Latin4とうきょう [Word][digit][hiraSeq]
Latin5나는 한국 사람입니다 [Word][digit][hangulSeq]