To connect to an HDFS installation, select the Define a storage configuration component
check box and then select the name of the component to use from those
available in the drop-down list.
This option requires you to have previously configured the
connection to the HDFS installation to be used, as described in the
documentation for the tHDFSConfiguration component.
If you leave the Define a
storage configuration component check box unselected,
you can only convert files locally.
Before you configure this component, you must have already
added a downstream component and linked it to the tHMapInput component, and retreived the
schema from the downstream component.
To configure the component, click the [...]
button and, in the Component Configuration
the following actions.
Click the Select button next to the Record structure field and
then, in the Select a
Structure dialog box that opens, select the
map you want to use and then click OK.
This structure must have been previously
Talend Data Mapper
Select the Input
Representation to use from the drop-down
Supported input formats are Avro, COBOL, EDI,
Flat, IDocs, JSON and XML.
Tell the component where each new record
begins. In order for you to be able to do so, you need to
fully understand the structure of your data.
Exactly how you do this varies depending on
the input representation being used, and you will be
presented with one of the following options.
Select an appropriate record
delimiter for your data. Note that you must specify
this value without quotes.
lets you specify a separator indicator, such as \n, to identify a new
Supported indicators are \n for a Unix-type new line,
\r\n for Windows and \r for Mac, and \t for tab characters.
with lets you specify the initial
characters that indicate a new record, such as <root, or the characters
that indicate where a record ends.
also supports new lines, \n
for a Unix-type new line, \r\n for Windows and \r for Mac, and \t for
Select the Regular
Expression check box if you to wish to
enter a regular expression to match the start of a
record. When you select XML or JSON, this check
box is selected by default and a pre-configured
regular expression is provided.
Sample File: To test the
signature with a sample file, click the
[...] button, browse to the
file you want to use as a sample, click
Open, and then click
Run to test your
Testing the signature lets you check
that the total number of records and their minimum
and maximum length corresponds to what you expect
based on your knowledge of the data. This step
assumes you have a local subset of your data to
use as a sample.
If your input representation is COBOL
or Flat with positional and/or binary encoding
properties, define the signature for the input
Input Record root
corresponds to the root element in your input
Size corresponds to the size in bytes
of the smallest record. If you set this value too
low, you may encounter performance issues, since
the component will perform more checks than
necessary when looking for a new record.
Size corresponds to the size in bytes
of the largest record, and is used to determine
how much memory is allocated to read the
Sample from Workspace or
Sample from File System: To
test the signature with a sample file, click the
button, and then browse to the file you want to
use as a sample.
Testing the signature lets you
check that the total number of records and their
minimum and maximum length corresponds to what you
expect based on your knowledge of the data. This
step assumes you have a local subset of your data
to use as a sample.
corresponds to the size in bytes of the footer, if
any. At runtime, the footer will be ignored rather
than being mistakenly included in the last record.
Leave this field empty if there is no footer.
Click the Next button to open
Parameters window, select the fields
that define the signature of your record input
structure (that is, to identify where a new record
begins), update the Operation and Value columns as
appropriate, and then click Next.
In the Record
Signature Test window that opens, check
that your Records are correctly delineated by
scrolling through them with the
Next buttons and performing
a visual check, and then click
Map the elements from the input structure to
the output structure in the new map that opens, and then
press Ctrl+S to save
For more information on creating maps, see
Talend Data Mapper User Guide.
Synchronize map with schema connections
Select this check box if you want to automatically regenerate
your map's input and output structures after one of the following changes:
- Connection metadata change
- Input or output connection added
- Input or output connection removed
No changes are detected when a connection is activated or deactivated.
If this check box is selected, the map is automatically
synchronized when opened from the component after a change. If not, a dialog appears to
ask whether you want to synchronize.
Note: For structures with multiple
connections, the map can only be synchronized if the structures have the same form
as the ones generated by the component configuration wizard. For example, flattening
maps with multiple outputs cannot be synchronized automatically.
Die on error
This check box is selected by default.
Clear the check box to skip any rows on error and complete the
process for error-free rows.
If you opt to clear the
check box, you can perform any of these options:
connection. In the output component, ensure that
you add a fixed metadata with the following columns:
Connect the tHMapInput component to an output
component, for example tAvroOutput, using a
- inputRecord: contains the rejected
input record during the transformation.
- recordId: refers to the record
identifier. For a text or binary input, the recordId
specifies the start offset of the record in the input
file. For an AVRO input, the recordId specifies the
timestamp when the input was processed.
- errorMessage: contains the
transformation status with details of the cause of the
You can retrieve the rejected records in a file.
One of these mechanisms triggers this feature: (1) a context
variable (talend_transform_reject_file_path) and (2) a
system variable set in the Advanced Job parameters (spark.hadoop.talend.transform.reject.file.path).
When you set the file path on the Hadoop
Distributed File System (HDFS), no further configurations are
needed. When you set the file on Amazon S3 or any other
Hadoop-compatible file systems, add the associated Spark
advanced configuration parameter.
In case of errors at runtime, tHMapFile checks if one of the
mechanisms exists and, if so, appends the rejected record to the
designated file. The reject file content includes the
concatenation of the rejected records without any additional
If the file system you use does not support
appending to a file, a separate file is created for each
rejection. The file uses the provided file path as the prefix
and adds a suffix that is the offset of the input file and the
size of the rejected record.
Note: Any errors while trying to store the reject are logged and the