What s new in v6.1
- Features several new machine learning components which enable advanced analytics on Apache Spark
- Extends the existing Data Masking capability to now run natively on Spark to help meet compliance and data privacy concerns
- Adds support for Continuous Integration builds in Talend ESB
- Introduces support for Git, the popular version control system for software development, as a storage backend for Talend Studio projects
- Adds extensive support for Cloudera Navigator 2.4, enabling you to trace data lineage for MapReduce and Spark.
Talend 6.1 features several new machine learning components which enable advanced analytics for Spark. Combined with the existing Continuous Delivery support added in Talend 6, developers can rapidly deploy machine learning algorithms to quickly provide feedback to data science teams so they can reassess the design of a given model. Data Scientists can use these algorithms to understand data, and teach the model to make predictions, with IT having the ability to quickly deploy into production for "testing with live users". This feature leverages the Spark Machine Learning Library (MLlib).
These components include:
- Random Forest (batch only) for classification (fraud, churn, propensity to buy) or for regression analysis
- Logistic Regression (batch and streaming) to estimate/predict the potential outcome of actions
- Clustering via K-means (batch and streaming) for customer segmentation
- Model Featuring (using the Spark DataFrame capabilities).
They make it possible to operationalize analytics into any data flow or data-driven process, in order to integrate prescriptions such as real-time recommendations.
Talend 6.1 is certified on Cloudera Navigator 2.4 (with Cloudera CDH 5.5). This lets users trace data lineage for MapReduce and Spark down to the level of the schema defined by the developer in a data Job, which is crucial for both impact analysis and data lineage.
Talend 6.1 provides updated support for the latest release of many big data platforms, including Cloudera 5.5, Hortonworks 2.3, MapR 5.0 and Microsoft Azure HDInsight 3.2 on Spark. Talend simplifies integration with these new big data platforms by automatically refactoring existing Jobs, so you spend less time integrating systems and more time running your big data workloads on them.
For those who work with NoSQL, support is also added for the Cassandra 2.2 and new components are introduced for MarkLogic 8, which enables you to integrate this database into your data pipeline.
Talend 6.1 introduces a new group of ServiceNow components to support this platform-as-a-service (PaaS) for Service Management (SM).
Several existing components are also updated, including AWS EMR on-demand APIs (start, stop), AWS Redshift on-demand APIs (start, stop), ExaSol, Marketo (to support activities, programs, and campaigns), DB2 Continuous Data Ingest support, Teradata ELT and Google BigQuery caching.Data Mapper
For Talend Data Mapper, 6.1 is essentially a stability release focusing on bug fixes and patch rollup.
Some new features have nonetheless been added. These include enhancements to how Talend Data Mapper handles SAP IDoc documents with the introduction of the new IDocs representation, and improvements to the tHMap component to make it easier to use.Data Quality
In Talend 6.1, the tDataMasking component now supports Spark. Data masking obfuscates your data (numbers, strings, dates, personally identifiable information and more) without impacting the rules that surround that data or allowing other users to see the data. Making data private in this way helps companies meet compliance mandates or privacy codes of conduct, and protects data against abuses or breaches.
Semantic Discovery, which was available as a Technical Preview in 6.0 and is now General Availability, automatically understands data attributes (is it a State, a first name, or an email address?) and concepts (is it an address, a person, or a customer?) and suggests relevant heuristics to profile the data. This guides the data profiling process and suggests what is needed to manage the quality of the data and prepare it for further processing.
Talend 6.1 introduces the Graphical Entity/Relationship Visualizer to MDM. This provides a graphical representation of the data model in the Studio (boxes and arrows) in addition to the flat tree-view, making it possible to better visualize the relationships between model entities. As a result, data models and data connections can more easily be understood by non-IT specialists, thus simplifying data model maintenance tasks.
In the Hierarchy Manager, it is now possible to search and filter a hierarchy. This makes is easier to access MDM data, enabling you to browse data through different facets and dimensions.