These properties are used to configure tHMapInput running in the Spark Batch Job framework.
The Spark Batch tHMapInput component belongs to the Processing family.
This component is available in Talend Platform products with Big Data and in Talend Data Fabric.
Basic settings
Storage |
To connect to an HDFS installation, select the Define a storage configuration component check box and then select the name of the component to use from those available in the drop-down list. This option requires you to have previously configured the connection to the HDFS installation to be used, as described in the documentation for the tHDFSConfiguration component. If you leave the Define a storage configuration component check box unselected, you can only convert files locally. |
Configure Component |
Before you configure this component, you must have already added a downstream component and linked it to the tHMapInput component, and retreived the schema from the downstream component. To configure the component, click the [...] button and, in the Component Configuration window, perform
the following actions.
|
Input |
Click the [...] button to define the path to where the input file is stored. |
Open Map Editor |
Click the [...] button to open the map for editing in the Map Editor of Talend Data Mapper . For more information, see Talend Data Mapper User Guide. |
Advanced settings
Die on error |
This check box is selected by default. Clear the check box to skip any rows on error and complete the process for error-free rows. If the check box is unselected, you can retrieve the rejected records in a file. One of these mechanisms triggers this feature: (1) a context variable (talend_transform_reject_file_path) and (2) a system variable set in the Advanced job parameters (spark.hadoop.talend.transform.reject.file.path). When you set the file path on the Hadoop Distributed File System (HDFS), no further configurations are needed. When you set the file on Amazon S3 or any other Hadoop-compatible file systems, add the associated Spark advanced configuration parameter. In case of errors at runtime, tHMapInput checks if one of the mechanisms exists and, if so, appends the rejected record to the designated file. The reject file content includes the concatenation of the rejected records without any additional metadata. If the file system you use does not support appending to a file, a separate file is created for each rejection. The file uses the provided file path as the prefix and adds a suffix that is the offset of the input file and the size of the rejected record. Note: Any errors while trying to store the reject are logged and the
processing continues.
|
Usage
Usage rule |
This component is used with a tHDFSConfiguration component which defines the connection to the HDFS storage. It is an input component and requires an output flow. |