Enhancements of Spark Job designer - 6.2

Talend Data Fabric Release Notes

EnrichVersion
6.2
EnrichProdName
Talend Data Fabric
task
Installation and Upgrade
EnrichPlatform
Talend Activity Monitoring Console
Talend Administration Center
Talend Artifact Repository
Talend CommandLine
Talend DQ Portal
Talend ESB
Talend Identity Management
Talend Installer
Talend JobServer
Talend Log Server
Talend MDM Server
Talend MDM Web UI
Talend Project Audit
Talend Repository Manager
Talend Runtime
Talend SAP RFC Server
Talend Studio
  1. Spark 1.5 and 1.6 are fully supported.

  2. Support for Spark 2.0 preview has been added. Users can select the Custom option from the Spark version list to configure the connection to a Spark 2.0 preview cluster.

  3. More components are available in Spark Batch and Spark Streaming Jobs regarding the following technologies:

    • JDBC

    • MongoDB

    • DynamoDB

    • Redshift

    • Data Mapper

  4. Enhancements of existing components:

    • tSqlRow allows users to use Hive QL to perform more sophisticated queries.

    • The SSL encryption has been enabled for the Cassandra components.

    • Support for Kerberos is available in tHiveConfiguration.

    • The Parquet components have been upgraded to obtain better performance with Spark.

    • tCacheIn and tCacheOut have been added to Spark Streaming Jobs.

  5. The Spark backpressure feature can be enabled for Spark Streaming Jobs.

  6. Extended support of Machine Learning algorithms:

    • The tModelEncoder component now supports almost all "feature engineering" algorithms that are available in Spark 1.5 and Spark 1.6

    • New components are created to support Linear Regression, SVM, Decision Tree and Gradient Boosted Tree, respectively.

    • tPredict is upgraded to be able to use all types of Classification or Clustering models.

  7. Support for Continuous Integration has been added to Spark Streaming Jobs.