Before you begin
You retrieved the tJapaneseTokenize_standard_scenario.zip file.
Procedure
-
Double-click tFileInputDelimited to open its
Basic settings view in the
Component tab.
- In the File name/Stream field, enter the path to the file containing the input text to be tokenized.
- Define the characters to be used as Row Separator and Field Separator.
- Define the numbers of rows in the Header and the Footer.
- Click the Edit schema button to define the columns of the source dataset and their data type.
-
Click the [+] button to add the schema columns.
Example
- Click OK to validate these changes and accept the propagation when prompted.
-
In the Advanced settings tab of the
tFileInputDelimited component, select the right
encoding from the Encoding list.
The inputJapaneseText.txt file uses the UTF-8 encoding.