To connect to an HDFS installation, select the Define a storage configuration component
check box and then select the name of the component to use from those
available in the drop-down list.
This option requires you to have previously configured the
connection to the HDFS installation to be used, as described in the
for the tHDFSConfiguration
If you leave the Define a
storage configuration component check box unselected,
you can only convert files locally.
To configure the component, click the [...] button and, in the Component Configuration window, perform
the following actions.
Click the Select button next to the Record Map field and then, in
the Select a Map dialog box
that opens, select the map you want to use and then click
This map must have been previously created in
Talend Data Mapper
Note that the input and output representations
are those defined in the map, and cannot be changed in the
Tell the component where each new record begins.
In order for you to be able to do so, you need to fully
understand the structure of your data.
Exactly how you do this varies depending on the
input representation being used, and you will be presented with
one of the following options.
Select an appropriate
record delimiter for your data. Note that you must
specify this value without quotes.
Separator lets you specify a
separator indicator, such as \n, to identify a new
indicators are \n for a Unix-type new line, \r\n for
Windows and \r for Mac, and \t for tab characters.
Start/End with lets you specify the
initial characters that indicate a new record,
such as <root, or the characters that indicate
where a record ends. This can also be a regular
with also supports new lines, \n for a
Unix-type new line, \r\n for Windows and \r for Mac,
and \t for
File: To test the signature with a
sample file, click the [...] button, browse to the file you
want to use as a sample, click Open, and then click
Run to test
signature lets you check that the total number of
records and their minimum and maximum length
corresponds to what you expect based on your
knowledge of the data. This step assumes you have
a local subset of your data to use as a
If your input representation is COBOL or
Flat with positional and/or binary encoding properties,
define the signature for the input record structure:
Input Record root
corresponds to the root element in your input
Size corresponds to the size in bytes
of the smallest record. If you set this value too
low, you may encounter performance issues, since
the component will perform more checks than
necessary when looking for a new record.
Size corresponds to the size in bytes
of the largest record, and is used to determine
how much memory is allocated to read the
Sample from Workspace or
Sample from File System: To
test the signature with a sample file, click the
button, and then browse to the file you want to
use as a sample.
Testing the signature lets you
check that the total number of records and their
minimum and maximum length corresponds to what you
expect based on your knowledge of the data. This
step assumes you have a local subset of your data
to use as a sample.
corresponds to the size in bytes of the footer, if
any. At runtime, the footer will be ignored rather
than being mistakenly included in the last record.
Leave this field empty if there is no footer.
Click the Next button to open
Parameters window, select the fields
that define the signature of your record input
structure (that is, to identify where a new record
begins), update the Operation and Value columns as
appropriate, and then click Next.
In the Record
Signature Test window that opens, check
that your Records are correctly delineated by
scrolling through them with the
Next buttons and performing
a visual check, and then click
Die on error
This check box is selected by default.
Clear the check box to skip any rows on error and complete the
process for error-free rows.
If you opt to clear the check box, you can perform any of these options:
connection. In the output component, ensure
that you add a fixed metadata with the following columns:
Connect the tHMapFile component to an output
component, for example tAvroOutput, using a
- inputRecord: contains the rejected
input record during the transformation.
- recordId: refers to the record
identifier. For a text or binary input, the recordId
specifies the start offset of the record in the
input file. For an AVRO input, the recordId
specifies the timestamp when the input was
- errorMessage: contains the
transformation status with details of the cause of
the transformation error.
If the check box is unselected, you can retrieve the rejected
records in a file. One of these mechanisms triggers this
feature: (1) a context variable
and (2) a system variable set in the Advanced job parameters
When you set the file path on the Hadoop Distributed File
System (HDFS), no further configurations are needed. When
you set the file on Amazon S3 or any other Hadoop-compatible
file systems, add the associated Spark advanced
In case of errors at runtime, tHMapFile
checks if one of the mechanisms exists and, if so, appends
the rejected record to the designated file. The reject file
content includes the concatenation of the rejected records
without any additional metadata.
If the file system you use does not support appending to a
file, a separate file is created for each rejection. The
file uses the provided file path as the prefix and adds a
suffix that is the offset of the input file and the size of
the rejected record.
Note: Any errors while trying to store the reject are logged and the