Skip to main content

Text standardization components

tJapaneseNumberNormalize Normalizes Japanese numbers (kansūji) to regular Arabic numbers.
tJapaneseTokenize Splits Japanese text into tokens.
tJapaneseTransliterate Converts textual data in Japanese to kana and Latin scripts.
tStem Enables to standardize data in columns before matching this data.
tTransliterate Converts strings from many languages of the world to a standard set of characters (Universal Coded Character Set, UCS).

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!