Enhancements of Spark Job designer - 6.1

Talend Real-time Big Data Platform Release Notes

EnrichVersion
6.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Installation and Upgrade
EnrichPlatform
Talend Activity Monitoring Console
Talend Administration Center
Talend Artifact Repository
Talend CommandLine
Talend DQ Portal
Talend ESB
Talend Identity Management
Talend Installer
Talend JobServer
Talend Log Server
Talend Project Audit
Talend Repository Manager
Talend Runtime
Talend SAP RFC Server
Talend Studio
  1. Spark 1.4 is fully supported.

  2. More components are available in Spark Batch and/or Spark Streaming Jobs, such as:

    • New ElasticSearch components to read data from ElasticSearch and provide ElasticSearch connection configuration for reuse.

    • Native connectors to Hive with the support of ORC file format.

    • New processing component: tTopBy.

    • The tLoop component is added to allow users to create iterations on a task execution in a Spark Batch Job.

    • The tRestWebServiceLookupInput is created to provide data for lookup to the main flow of a tMap component in a Spark Streaming Job.

  3. Extended support of Machine Learning algorithms:

    • The tModelEncoder component is added to enable an easy "feature engineering" process, in which it transforms data into features

    • New components are created to support Random Forest, Logistic Regression and KMeans, respectively.

    • tClassify is created to use a Random Forst model or a Logistic Regression model to perform predictive classification.

    • tPredictCluster is available to apply a KMeans model to perform predictive clustering.

  4. Support for Continuous Integration has been added to Spark Batch Jobs.

  5. A new DataMasking component is available now to enable you to hide personal sensitive data with random characters to protect it. Data will keep looking real and consistent and will remain usable for purposes such as testing and training.