Apache Spark (with Python or Scala) - Technical Preview - Import - 7.1

Talend Data Catalog Bridges

author
Talend Documentation Team
EnrichVersion
7.1
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
EnrichPlatform
Talend Data Catalog

Bridge Specifications

Vendor Apache
Tool Name Spark (with Python or Scala)
Tool Version 2.x
Tool Web Site http://spark.apache.org/
Supported Methodology [Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Python File
Incremental Harvesting
Multi-Model Harvesting
Remote Repository Browsing for Model Selection
Data Profiling

BRIDGE INFORMATION
Import tool: Apache Spark (with Python or Scala) 2.x (http://spark.apache.org/)
Import interface: [Data Integration] Multi-Model, Data Store (Physical Data Model), (Source and Target Data Stores, Transformation Lineage, Expression Parsing) via Python File from Apache Spark (with Python or Scala) - New Beta Bridge
Import bridge: 'ApacheSpark' 10.1.0

BRIDGE DOCUMENTATION
The purpose of this Apache Spark import bridge is to detect and parse all the statements in order to generate the exact scope (data models) of the involved source and target data stores, as well as the data flow lineage and impact analysis (data integration ETL/ELT model) between them.


Bridge Parameters

Parameter Name Description Type Values Default Scope
Directory Select a directory with the textual files that contain the code to import DIRECTORY     Mandatory
Code Language Select the language ENUMERATED
Python
Scala
Python  
Directory Filter Specify a search filter for the sub directories. Use regular expressions in java format if needed (e.g. '.*_script'). Multiple conditions can be defined by using a space as a separator (e.g. 'directory1 directory2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my directory"). Negation can be defined with the preceeding dash character (e.g. '-bin'). STRING      
File Filter Specify a search filter for files. Use regular expressions in java format if needed (e.g. '.*\.py'). Multiple conditions can be defined by using a space as a separator (e.g. 'file1 file2'). The condition must be escaped with double quotes if it contains any spaces inside (e.g. "my file.py"). Negation can be defined with the preceeding dash character (e.g. '-\.tar\.gz'). STRING      
Miscellaneous Specify miscellaneous options identified with a -letter and value.

For example, -e UTF-16

-e: encoding. This value will be used to load text from the specified script files. By default, UTF-8 will be used. Here are some other possible values: UTF-16, UTF-16BE, US-ASCII.
-p: parameters. Full path to the yaml file that defines all the entry points for the scripts to parse as well as their input parameters. The new template will be generated automatically if the file doesn't exist. Use double quotes in order to escape the path that contains spaces.
-pppd. enables the DI/ETL post-processor processing of DI/ETL designs in order to create the design connections and connection data sets.
STRING      

 

Bridge Mapping

Mapping information is not available