Text standardization components

tJapaneseNumberNormalize	Normalizes Japanese numbers (kansūji) to regular Arabic numbers.
tJapaneseTokenize	Splits Japanese text into tokens.
tJapaneseTransliterate	Converts textual data in Japanese to kana and Latin scripts.
tStem	Enables to standardize data in columns before matching this data.
tTransliterate	Converts strings from many languages of the world to a standard set of characters (Universal Coded Character Set, UCS).

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!