Configuring the output component and executing the Job - Cloud - 8.0

Text standardization

Version
Cloud
8.0
Language
English
Product
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Standardization components > Text standardization components
Data Quality and Preparation > Third-party systems > Data Quality components > Standardization components > Text standardization components
Design and Development > Third-party systems > Data Quality components > Standardization components > Text standardization components
Last publication date
2024-02-20

Procedure

  1. Double-click tLogRow to open its Basic settings view in the Component tab.
  2. Click Sync columns to retrieve the schema from the preceding component.
  3. Select Table (print values in cells of a table) in the Mode area.
  4. Press F6 to run the Job.

Results

The normalized numbers are written in Run view:

.-----------------+---------------------.
|               tLogRow_1               |
|=----------------+--------------------=|
|kansuji          |normalized_arabic_num|
|=----------------+--------------------=|
|〇〇七              |7                    |
|一〇〇〇             |1000                 |
|三千2百2十三          |3223                 |
|15,7             |157                  |
|一万               |10000                |
|負一千一百五十八         |-1158                |
|1.2万345.67       |12345.67             |
|1.2万345.6三       |12345.63             |
|4,647.100        |4647.1               |
|七十五點四零二五         |75.4025              |
|万                |10000                |
|億                |100000000            |
|兆                |1000000000000        |
|京                |10000000000000000    |
|垓                |100000000000000000000|
|九百八十三万 六千七百三     |9836703              |
|二十億 三千六百五十二万 千八百一|2036521801           |
|¥百二十三            |¥123                 |
|百二十三円            |123円                 |
'-----------------+---------------------'

The tJapaneseNumberNormalize supports Japanese numbers written using a sequence of kanji numerals: 〇〇七 becomes 7.

The tJapaneseNumberNormalize supports Japanese numbers written using a combination of kanji and Arabic numbers: 三千2百2十三 becomes 3223.

The decimal comma is not kept in the normalized numbers returned by the tJapaneseNumberNormalize component. As a result, 4,647.100 becomes 4647.1 and 15,7 becomes 157. If the input numbers use the decimal comma as the decimal separator, you must replace the decimal comma with a decimal point.

The input numbers can use a comma to separate groups of thousands: 4,647.100 becomes 4647.1. The tJapaneseNumberNormalize component also removes trailing zeros from the input number.

The tJapaneseNumberNormalize supports large kanji numbers: 兆六百万五千一 becomes 1000006005001.