tJapaneseTransliterate - Cloud - 8.0

Text standardization

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Standardization components > Text standardization components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Text standardization components
Design and Development > Third-party systems > Data Quality components > Standardization components > Text standardization components
Last publication date
2024-02-20

Converts textual data in Japanese to kana and Latin scripts.

Transliteration is a phonetic operation where the tJapaneseTransliterate component attempts to create in kana characters or Roman characters (rōmaji) an equivalent of the original textual data based on the sounds the string represents.

The modern Japanese writing system uses a combination of kanji (Chinese characters) and syllabic kana (hiragana and katakana). For the benefit of non-Japanese speakers who cannot read kanji or kana, romanization systems have been developed to write the Japanese language in Latin script.

The tJapaneseTransliterate component converts Japanese into kana or rōmaji (Roman characters):
  • Kana characters
    • Hiragana
    • Katakana reading
    • Katakana pronunciation
  • Rōmaji
    • Revised Hepburn: This is the most widely used romanization system.
    • Kunrei-shiki: This romanization system has been standardized by the Japanese Government and the International Organisation for Standardisation as ISO 3602. It is a modified version of the Nihon-shiki system for modern standard Japanese.
    • Nihon-shiki: This romanization system is the most regular romanization system because it maintains a one-to-one correspondence between kana and rōmaji.

In local mode, Apache Spark 2.4.0 and later versions are supported.

This component is not shipped with your Talend Studio by default. You need to install it using the Feature Manager. For more information, see Installing features using the Feature Manager.

For more technologies supported by Talend, see Talend components.