Delimited File (CSV) - Import - 7.1

Talend Data Catalog Bridges

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Data Catalog

Note: This file format needs to be imported with the File System (CSV, Excel, XML, JSON, Avro, Parquet, ORC, COBOL Copybook), Apache Hadoop Distributed File System (HDFS Java API) or Amazon Web Services (AWS) S3 Storage bridges.

Bridge Specifications

Vendor ISO
Tool Name Delimited File (CSV)
Tool Version N/A
Tool Web Site https://en.wikipedia.org/wiki/Delimiter-separated_values
Supported Methodology [File System] Data Store (Physical Data Model) via CSV, TXT File
Incremental Harvesting
Multi-Model Harvesting
Remote Repository Browsing for Model Selection
Data Profiling

BRIDGE INFORMATION
Import tool: ISO Delimited File (CSV) N/A (https://en.wikipedia.org/wiki/Delimiter-separated_values)
Import interface: [File System] Data Store (Physical Data Model) via CSV, TXT File from Delimited File (CSV)
Import bridge: 'FlatFile' 10.1.0

BRIDGE DOCUMENTATION
Discovers metadata of a flat delimited (CSV) file by sampling its data and examining the file's first row header, if any.
The discovered metadata includes the list of fields, and their names, positions, optionality and data types.

Supports the following field delimiters:
', (comma)' , '; (semicolon)', ': (colon)', '\t (tab)', '| (pipe)', '0x1 (ctrl+A)'
Samples up to 1000 rows.
Uses the machine's local to read files and allows you to specify the character set encoding files use.
Does not support fixed-width files. You can use the Metadata Excel Format bridge to (define and) import formats of fixed-width files.


Bridge Parameters

Parameter Name Description Type Values Default Scope
File Path to file to import FILE *.*   Mandatory
Encoding
The character set encoding files use.
FYI: The default on Windows is 'Western European (Windows-1252)' and 'Western European (ISO-8859-1)' on other platforms.
When empty the local of the machine reading the file is used.
STRING   windows-1252  
Top rows to skip Number of rows to skip from top STRING      
Delimiter By default, the delimiter is determined automatically. Use this parameter for special cases when it's needed. STRING      
Miscellaneous Specify miscellaneous options identified with a -letter and value.

For example, -m 4G -f 100 -j -Dname=value -Xms1G

-m the maximum Java memory size whole number (e.g. -m 4G or -m 2500M ).
-v set environment variable(s) (e.g. -v var1=value -v var2="value with spaces").
-j the last option that is followed by Java command line options (e.g. -j -Dname=value -Xms1G).
-hadoop key1=val1;key2=val2 to manualy set hadoop configuration options
-tps 10 maximum threads pool size
-tl 3600s processing time limit in s -seconds m - minutes or h hours;
-fl 1000 processing files count limit;
-delimited.top_rows_skip 1 number of rows to skip while processing csv files
-delimited.extra_separators ~,||,|~ comma separated extra delimiters each of which will be used while processing csv files
-delimited.no_header by default, bridge automatically tries to detect headers while processing csv files(basing on header columns types), use this option to disable headers import(f.e. to hide sensitive data)
-fresh.partition.models - use to import latest modified files when processing partitions defined in Partitioned directories parameter
-subst K: C:/test - use to associate a root path part with a drive or another path.
-skip.download - use to disable dependencies downloading and use only download cache
-prescript [cmd] - runs a script command before bridge execution. Example: -prescript \"script.bat\"
The script must be located in the bin directory, and have .bat or .sh extension.
The script path must not include any parent directory symbol (..)
The script should return exit code 0 to indicate success, or another value to indicate failure.
STRING      

 

Bridge Mapping

Mapping information is not available