Winter '18 Release Notes - Cloud

Talend Cloud Release Notes

EnrichVersion
Cloud
EnrichProdName
Talend Cloud
EnrichPlatform
Talend API Designer
Talend API Tester
Talend Data Inventory
Talend Data Preparation
Talend Data Stewardship
Talend Management Console
Talend Pipeline Designer
task
Installation and Upgrade
Release Notes

Talend Data Stewardship Cloud

Talend Data Stewardship Cloud engages your data workers to partner up and collaborate on turning data into highly shared, curated and ready to use assets. It empowers anyone to collaborate to data curation, arbitration or validation within their field of responsibility, through an intuitive, workflow-based and outcome-driven experience.

Feature Description
Curate and certify data Improve productivity and get guidance in your data certification and task curation.
  • Define the data model to comply with.
  • Define and apply rules (survivorship, mass updates).
  • Merge and match data.
  • Resolve data errors.
  • Arbitrate on data (classification and certification).
  • Arbitrate on pairs or groups of near duplicates. This can be used in the context of matching very high volume of data using machine learning on Spark.
Collaborate for trusted data Orchestrate your data stewardship activities in campaigns and delegate tasks to the people that know the data best.
  • Define user roles.
  • Bulk assign and delegate tasks.
  • Define workflows.
  • Define priorities.
  • Tag and comment.
Integrate data stewardship Embed governance and stewardship in your data management efforts.
  • Manage rejects in Data Integration flows.
  • Embed human certification and error resolution into Master Data Management processes.
  • Take matching decisions that cannot be processed automatically.
Audit and track data error resolution actions Comply with your data governance policies and measures the outcomes of your data stewardship efforts.
  • Monitor progress of campaigns.
  • Track changes.
  • Undo/redo.
Service Level Agreement (SLA) on task resolution Define due dates at campaign level and also at task level, i.e. from the studio.
Promote campaigns and data models across environments Work more easily with several Talend Data Stewardship environments which belong to identical Talend product versions.

Export campaigns or data models from a source environment, and then import them back to the target environment.

Copy campaigns and data models in Talend Data Stewardship Copy a campaign or a data model in your current instance and modify any of its metadata or parameter values to create a close copy without having to recreate the campaign or data model from scratch.
Functions on data in columns Several functions have been introduced to improve the enrichment and cleansing possibilities.

Known issues: https://jira.talendforge.org/issues/?filter=27113

TDS user caches: Talend Data Stewardship Cloud uses two user caches to enhance its performances and each has its own Time To Live (TTL):
  • Short Cache (SC) which contains user's entitlements: TTL 2mn,
  • Long Cache (LC) which contains user's first name/last name: TTL 15mn.

Issue

Workaround

Log in with a freshly created user account:

When the account manager creates a new Data Stewardship user using Talend Cloud Management Console, this newly created user account can get an "Access Denied" error message if used straight away.

Wait for SC TTL expiration (which can take up to two minutes) before connecting and accessing Data Stewardship.
Use a freshly created user account in campaign definition:

When the account manager creates a new Data Stewardship user using Talend Cloud Management Console, this new user account cannot be used straight away in the campaign definition even if it is listed in the "Campaign Owners" / "Stewards" drop down lists.

If campaign owners try to use the newly created account in the campaign definition before LC TTL expiration, they won't be able to save the campaign and will get one of the following error messages:
  • The following campaign owners were not found,
  • The following data stewards were not found.
The campaign owner should wait for LC TTL expiration (which can take up to 15 minutes) before using the newly created user account in the campaign definition.
Login to Data Stewardship between SC TTL and LC TTL expiration:

If users log in to Data Stewardship for the first time between SC TTL expiration and LC TTL expiration, their first name/last names won't be properly displayed. "Deleted user <UUID>" will be displayed instead of user's "<first name> <last name>".

The mainly impacted pages in Data Stewardship are:
  • Task history,
  • Data Stewardship page header,
  • Campaign list,
  • Task grid,
  • Data model list,
  • Participant list in the left panel,
  • Assign task function in the right panel.

This display issue is temporary. It lasts at most for 15 minutes. The user's "<first name> <last name>" will be properly displayed once both caches, SC and LC, are renewed.

The campaign owner should wait for LC TTL expiration (which can take up to 15 minutes) before using the newly created user account in the campaign definition.

Get started with Talend Data Stewardship Cloud on this page.

Talend Data Preparation

Feature Description
Dynamic selection of a preparation in a Job When using the tDataprepRun component to operationalize your preparations in Talend Studio, you can select the Dynamic preparation selection check box to define a preparation via its path in the application, rather than its technical id. This allows you to dynamically select a preparation at runtime, depending on the input data for example.
New parameters for CSV datasets When working with data from local CSV files, you can now configure the escape and text enclosure characters, as well the encoding, at import and export time.
Snowflake connectivity Talend Data Preparation now offers direct connectivity to data stored in Snowflake databases in order to create datasets.
Filter data outside of the sample From each individual column, you have the possibility to create a filter on empty, invalid, or valid data, even if there is no matching value in the sample, in order to fetch more rows.
Column name used in data discovery In order to identify the semantic type of your columns with more accuracy, the name of the column is now taken into account during the data discovery process.
New functions Several new functions have been added to improve the enrichment and cleansing possibilities:
  • Fill empty cells from above
  • Generate sequence
  • Remove negative values
  • Standardize value (Fuzzy matching)
  • Modulo

Some of the existing functions have been improved with new parameters or a new behavior:

  • All numeric operations can now manage percentages
  • Month and day labels, as well as quarters can now be extracted from dates
  • Minutes and seconds are now supported when calculating a time until a date
  • When masking data which category is based on a regular expression, the valid and invalid values are generated based on the expression and the original validity of each cell. For dictionary-based category, random values are picked from the dictionary.

Moreover, some functions can be applied on the whole table:

  • Delete empty rows
  • Remove duplicate rows
  • Format phone number
  • Remove trailing and leading characters

Finally, you can now decide if you want to output the effects of certain functions in a new column or in place by selecting the Create new column check box.

Known issues: https://jira.talendforge.org/issues/?filter=26475

Issue

Workaround

Due to a compatibility issue between the Talend Studio and Talend Data Preparation libraries, the tDataprepRun component does not work in a Spark Batch or Streaming Job.

See issue https://jira.talendforge.org/browse/TDP-5244.

In order to work with Talend Data Preparation Cloud, you can either:

As a side effect, patching your Talend Studio to make it compatible with Cloud will make it incompatible with the on-premises version of Talend Data Preparation.

Consequently, if you want to create Spark Jobs that are compatible with both the Cloud and the on-premises versions of Talend Data Preparation, you will actually need to work with two versions of Talend Studio: one with the patch to work on the Cloud, and one without the patch to work on-premises.

Get started with Talend Data Preparation Cloud on this page.

Talend Integration Cloud

Feature Description
New public API version With the Talend Integration Cloud Public API v1.1, it is possible to use <parmname> and <parameter_parmname> formats when setting context parameters in Studio. The values of the context parameters do not change to <parameter_parmname> during execution.
Talend Exchange As Integration Actions are no longer supported on Talend Integration Cloud, Talend Exchange has been removed from the following modules:
  • importing actions
  • uploading to Exchange
  • creating a flow
Talend Cloud components The tActionReject component has been fully reinstated.

The tActionLog and tActionFailure components have been renamed to tJobLog and tJobFailure.

Flow configuration Advanced context parameters are visible and configurable in the Flow Configuration panel in the Flow Builder.

Known issues: https://jira.talendforge.org/issues/?filter=26511

Get started with Talend Integration Cloud on this page.

Talend Cloud Management Console

Feature Description
Talend Cloud Management Console roles The rights of the previous Administrator role have been devieded between two roles: Project Administrator and Security Administrator.
The Project Administrator can manage:
  • Projects (including project authorizations)
  • User libraries in the Nexus repository
The Security Administrator can manage:
  • Users
  • Roles
  • Groups
  • Subscription (including the Account page)
  • Password policies
Talend Data Stewardship roles With the introduction of Talend Data Stewardship on Cloud, the following roles can be assigned to users:
  • Campaign Owner
  • Data Steward

Known issues: https://jira.talendforge.org/issues/?filter=28553

Get started with Talend Cloud Management Console Cloud on this page.

Talend Studio

Talend Cloud Winter '18 ships with the 6.5.1 release of Talend Real-Time Big Data Platform. For the list of new features, see Talend Real-Time Big Data Platform Release Notes.