Applying a preparation on the data - Cloud

Applying a preparation on the data - Cloud - 8.0

Azure Data Lake Store

Version

Cloud

8.0

Language

English

Product

Talend Big Data

Talend Big Data Platform

Talend Data Fabric

Talend Data Integration

Talend Data Management Platform

Talend Data Services Platform

Talend ESB

Talend MDM Platform

Talend Open Studio for Big Data

Talend Open Studio for Data Integration

Talend Open Studio for ESB

Talend Real-Time Big Data Platform

Module

Talend Studio

Content

Data Governance > Third-party systems > Cloud storages > Azure components > Azure Data Lake Storage Gen2 components

Data Quality and Preparation > Third-party systems > Cloud storages > Azure components > Azure Data Lake Storage Gen2 components

Design and Development > Third-party systems > Cloud storages > Azure components > Azure Data Lake Storage Gen2 components

Last publication date

2023-06-07

In the design workspace, select tDataprepRun and click the Component tab to define its basic settings.
In the URL field, type the URL of the Talend Data Preparation or Talend Cloud Data Preparation web application, between double quotes. Port 9999 is the default port for Talend Data Preparation.
In the Username and Password fields, enter your Talend Data Preparation or Talend Cloud Data Preparation connection information, between double quotes.
If you are working with Talend Cloud Data Preparation and if:
- SSO is enabled, enter an access token in the field.
- SSO is not enabled, enter either an access token or your password in the field.
Click Choose an existing preparation to display a list of the preparations available in Talend Data Preparation or Talend Cloud Data Preparation, and select preparation_adlsgen2.

This scenario assumes that a preparation with a compatible schema has been created beforehand.
Click Fetch Schema to retrieve the schema of the preparation, preparation_adlsgen2 in this case.

The output schema of the tDataprepRun component now reflects the changes made with each preparation step. The schema takes into account columns that were added or removed for example. By default, the output schema will use the String type for all the columns, in order not to overwrite any formatting operations performed on dates or numeric values during the preparation.