Excel File (XLSX) - Import - 7.1

Talend Data Catalog Bridges

Talend Documentation Team
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
Talend Data Catalog

Bridge Specifications

Vendor ISO
Tool Name Excel File (XLSX)
Tool Version N/A
Tool Web Site https://tools.ietf.org/html/rfc4180
Supported Methodology [File System] Data Store (Physical Data Model) via XLSX File
Remote Repository Browsing for Model Selection
Data Profiling
Multi-Model Harvesting
Incremental Harvesting

Import tool: ISO Excel File (XLSX) N/A (https://tools.ietf.org/html/rfc4180)
Import interface: [File System] Data Store (Physical Data Model) via XLSX File from Excel File (XLSX)
Import bridge: 'ExcelFile' 10.1.0

Discovers file format details from data files directly. The details include fields' names, positions, optionality and data types.
Can import a directory with subdirectories of files and allows you to filter files using wildcards.

Supports ?delimiter-separated ?flat files and Microsoft Excel files.
Tries to discover the field delimiter automatically and allows you to specify it manually.
Looks for field names in the header at the first line and allows you to specify the header line number.
Tries to identify data types of fields by reading the first 1000 rows of the file and allows you to specify the number of rows to sample.
Handles files with XLSX extension as Excel files. An Excel file can have multiple sheets. Each sheet is imported as a file with the sheet's name.
Uses the machine's local to read files and allows you to specify the character set encoding files use.
Does not support fixed-width files. You can use the Metadata Excel Format bridge to (define and) import formats of fixed-width files.

Bridge Parameters

Parameter Name Description Type Values Default Scope
File Path to file to import FILE *.XLSX   Mandatory
Header row number
The row number that contains the file header describing its field names.
When empty the header is assumed to be in the first row.
STRING   1  
Number of rows to read
The maximum number of rows to sample from files. The rows are used to identify file format details, like field data types.
When empty, the number of rows is assumed to be 1000.
STRING   1000  
Miscellaneous Specify miscellaneous options identified with a -letter and value.

For example, -m 4G -f 100 -j -Dname=value -Xms1G

-m the maximum Java memory size whole number (e.g. -m 4G or -m 2500M ).
-v set environment variable(s) (e.g. -v var1=value -v var2="value with spaces").
-j the last option that is followed by Java command line options (e.g. -j -Dname=value -Xms1G).
-hadoop key1=val1;key2=val2 to manualy set hadoop configuration options
-tps 10 maximum threads pool size
-tl 3600s processing time limit in s -seconds m - minutes or h hours;
-fl 1000 processing files count limit;
-delimited.top_rows_skip 1 number of rows to skip while processing csv files
-delimited.extra_separators ~,||,|~ comma separated extra delimiters each of which will be used while processing csv files
-fresh.partition.models - use to import latest modified files when processing partitions defined in Partitioned directories parameter
-subst K: C:/test - use to associate a root path part with a drive or another path.
-skip.download - use to disable dependencies downloading and use only download cache


Bridge Mapping

Mapping information is not available