Converts textual data in Japanese to kana and Latin scripts.
Transliteration is a phonetic operation where the tJapaneseTransliterate component attempts to create in kana characters or Roman characters (rōmaji) an equivalent of the original textual data based on the sounds the string represents.
The modern Japanese writing system uses a combination of kanji (Chinese characters) and syllabic kana (hiragana and katakana). For the benefit of non-Japanese speakers who cannot read kanji or kana, romanization systems have been developed to write the Japanese language in Latin script.
- Kana characters
- Katakana reading
- Katakana pronunciation
- Revised Hepburn: This is the most widely used romanization system.
- Kunrei-shiki: This romanization system has been standardized by the Japanese Government and the International Organisation for Standardisation as ISO 3602. It is a modified version of the Nihon-shiki system for modern standard Japanese.
- Nihon-shiki: This romanization system is the most regular romanization system because it maintains a one-to-one correspondence between kana and rōmaji.
In local mode, Apache Spark 1.6.0, 2.3.0 and 2.4.0 are supported.
For more technologies supported by Talend, see Talend components.
Depending on the Talend product you are using, this component can be used in one, some or all of the following Job frameworks:
- Standard: see tJapaneseTransliterate Standard properties.
The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, Talend MDM Platform and in Talend Data Fabric.
- Spark Batch: see tJapaneseTransliterate properties for Apache Spark Batch.
- Spark Streaming: see tJapaneseTransliterate properties for Apache Spark Streaming.