What s new in v6.0 - 6.1

Talend Documentation Team
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
Administration and Monitoring
Data Governance
Data Quality and Preparation
Design and Development
Installation and Upgrade
Talend Studio

What s new in v6.0

This technical note highlights the important new features and capabilities of Talend 6.0. Supported features vary between Talend Open Studio and subscription products. Reference the website product pages for more detail.

Talend 6.0:

  • Provides full support for Apache Spark and Spark Streaming, offering up to 5X performance improvement over MapReduce (based on the TPC-H benchmarks), and a 100X in-memory improvement for some tasks, which allows you to process more data faster. Existing Talend MapReduce jobs can easily be upgraded to Spark jobs, saving you significant time and effort. Also includes support for Amazon's EMR Spark.
  • Delivers an end-to-end Internet of Things integration platform combining IoT connectivity (MQTT, AMQP); high-speed, reliable messaging (Apache Kafka, Amazon Kinesis, Talend ESB); and high-speed big data integration (Spark).
  • Introduces Continuous Delivery data integration, which delivers huge gains in user productivity and speeds up software development by detecting software defects earlier while making the development team integration process more stable and predictable.
  • Provides a new MDM REST API and Query Language making it easier to access MDM and big data from websites, B2B systems, mobile and SaaS applications.
  • Delivers new data masking and semantic discovery (technical preview) capabilities to better understand your raw data, enrich their formats so that they can be easily connected and shared, and protect your most sensitive data.
  • Benefits from some major refactoring work in Talend MDM: enterprise-grade scalability features and now running on Tomcat, with support for clustering and session replication. The MDM Web User Interface has also been significantly enhanced.
  • Enhances enterprise connectivity by continuing to add to and update the extensive list of over 900 components and connectors.
  • Comes with a new, modernized look-and-feel to the Studio user interface and features upgrades to the core platform as well.
  • Introduces two new product offerings, Talend Data Fabric and Talend Real-Time Big Data Platform, while also slightly renaming existing products.

Talend 6.0

Talend 6.0 introduces a series of Continuous Delivery features, which are fully supported in Data Integration and provided as a Technical Preview in Talend ESB. As part of the Continuous Delivery process, Continuous Integration is a development methodology where developers integrate code into a shared repository several times a day. Each check-in is verified by an automated build, allowing teams to uncover issues earlier. This leads to faster time to production for IT teams, with fewer errors. Software defects are discovered earlier and are easier to triage (fewer changes per build), leading to increases in productivity and decreased costs. All of this helps make the integration process more stable and predictable as large "last-minute" changes are avoided.

Major enhancements have been made to the Studio User Interface and the MDM Web User Interface, including: a modern design (arrows, grid, tiles, grabber, connectors, palette icons, and more), improved palette layout, simplified properties layout for components, and improvements to the components search functionality. All of these changes are aimed at increasing productivity by removing unnecessary complexity.

The Talend code generator now generates Java 8 code, making it possible for users who extend Talend to take advantage of Java 8 features.

In 6.0, the version of Eclipse used by Talend Studio has moved from 3.6 to 4.4, allowing the use of new tools (open source editors) and plugins, such as M2 for Maven support.

ElasticSearch 1.5.2, Kibana 3.1.2, and Logstash 1.5 have all been updated to newer releases, providing a common indexing service (ElasticSearch) that is relevant for streaming capabilities.

The Nexus repository manager has been updated to Nexus 2.11.

The new Central Licensing Service provides a service that customers can install on-premises in order to manage license keys between Studio and TAC users, making it possible to administer licenses across multiple domains and Repositories.

New and Renamed Products

Talend 6 introduces two new product offerings, Talend Data Fabric and Talend Real-Time Big Data Platform, while also slightly renaming existing products. Talend Data Fabric enables you to integrate all your data, operate in real-time and act with insight, by combining Big Data Integration, Data Integration, Cloud Integration, Application Integration and Master Data Management.

Talend Real-Time Big Data Platform combines and extends the Big Data Platform and Data Services Platform with Spark Streaming, Internet of Things connectivity and Enterprise messaging (ActiveMQ, AMQP, Kafka, JMS, etc.), Focus is on real-time big data use cases and emerging IoT use cases.

Existing subscription products have been slightly renamed as follows:

Talend 5.6 Name Talend 6.0 Name
Talend Enterprise Data Integration Talend Data Integration
Talend Enteprise ESB Talend ESB
Talend Enterprise Big Data Talend Big Data
Talend Platform for Big Data Talend Big Data Platform
Talend Platform for Data Management Talend Data Management Platform
Talend Platform for Data Services Talend Data Services Platform
Talend Platform for Master Data Management Talend Master Data Management Platform
Talend Platform for Data Services with Big Data Talend Real-Time Big Data Platform
Talend Platform - Universal Talend Data Fabric
Big Data

In Talend 6.0, support has been added for Spark and Spark Streaming through more than 100 Spark components. Designed for the needs of operational Big Data, Spark is an in-memory computing, data processing engine that runs programs up to 100X faster than Hadoop, or 10X faster on disk.

Using visual tools in Talend Studio, you build Spark jobs that automatically generate fast big data code. Utilize in-memory sliding window capabilities to compare data values over a set period of time, and with Spark caching components, your code is optimized for high-performance without the time spent tuning Spark yourself.

Talend's running of the TPC-H benchmark (a set of end-to-end integration flows) shows an increase in performance of up to 5X. Spark can run on Hadoop, standalone or in the cloud, and can be deployed on existing Hadoop clusters. A Studio developer can design a Spark job in the Studio and then deploy and run it on top of YARN.

Talend MapReduce Jobs can be migrated to Spark, simply by opening the Job and clicking a button to change the runtime from MapReduce to Spark. This future proofs your Big Data projects and provides productivity savings by avoiding the need to rebuild Talend MapReduce jobs using Spark. Even existing ETL Jobs can be modified in the Studio to start using Spark, thanks to the availability of over 100 Spark components.

In 6.0, Talend provides support for integrating with Apache Kafka, which is relevant for use with Spark Streaming. Kafka is for high scalability, high volume messaging. It can support a large number of consumers and retain large amounts of data with very little overhead.

Talend 6.0 supports Amazon Kinesis, a messaging service in the cloud provided by Amazon. For Hadoop clusters running in Amazon Web Services, Talend makes the use of Kinesis and Hadoop extremely simple, allowing for easy deployment of a very scalable messaging service in the cloud that you can connect to your Hadoop cluster. Together, support for these messaging services, AMQP, MQTT and Spark, deliver a comprehensive end-to-end integration platform for the Internet of Things. Reliably capture and deliver millions of events per second from sensors then instantly ingest, process and deliver insight to real-time applications and fast NoSQL data stores.

Talend provides updated support for the latest release of several big data platforms including: Cassandra 2.1.6; Neo4J 2.1; MongoDB 3.0; Cloudera 5.4; Hortonworks 2.2; MapR 4.1; Pivotal 2.0.1; Amazon EMR 4.0; Hadoop 2.4; Spark 1.3 and Amazon EMR Spark 1.4. Simplified integration with these new big data platforms means you spend less time integrating systems and more time running your big data workloads on them.

Support has been added for Lambda architecture, a data-processing architecture built on three layers (batch, speed and serving) and designed to handle massive quantities of data by taking advantage of both batch-processing (for comprehensive, large quantities of historical data) and real-time stream-processing methods. Output from the batch and speed layers is stored in the serving layer, which responds to ad-hoc queries by returning precomputed views or building views from the processed data and can be used by analytics and business intelligence tools.

Data Integration

Several new components have been added in Talend 6.0, including Teradata SCD; Teradata CDC; Vertica ELT; Salesforce Wave Analytics; RedShift Bulkload; and WebSphere MQ. These new components provide extended connectivity options that help expand reach of your projects.

Additionally, several components have been updated to allow the use of the latest features and to support the latest systems. These include: Microsoft CRM 2013/2015; Marketo; Oracle 12; MySQL 5.6; MariaDB 10; DB2 v10.5; MSSQL 2014; Postgres 9.4; Teradata 15; Vertica 7.1; Netezza 7.x; and Cassandra (support of CQL 3).

Improved scalability has been added for the Talend Administration Center Job Scheduler, which now supports thousands of concurrent jobs and high-scalability environments.

Data Mapper

For Talend Data Mapper, 6.0 is essentially a stability release focusing on bug fixes and patch rollup.

Some new features have nonetheless been added: Avro is now supported as an output representation, and a new COBOL representation has been added that provides better performance than the Flat representation in some specific use cases.

Data Quality

In Talend 6.0, Talend Data Quality introduces a new data masking component (tDataMasking) that obfuscates data (numbers, strings, dates, personally-identifiable information and so on) without impacting the rules that surrounds that data or allowing other users to see the data. This helps meet compliance mandates or privacy code of conducts, and protects data against abuses or breaches.

The new Semantic Discovery feature, which is available as a Technical Preview in Talend 6.0, automatically understands data attributes (is it a State, First Name, an email address?) and concepts (is it an address, a person, a customer?) from a source and suggests relevant heuristics to profile the data. This guides the data discovery process and suggests what is needed to manage its quality and prepare it for further processing.