Selecting the set of columns you want to analyze in the delimited file - Cloud - 7.3

Talend Studio User Guide

Version
Cloud
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Cloud
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Design and Development
Last publication date
2024-02-13
Available in...

Big Data Platform

Cloud API Services Platform

Cloud Big Data Platform

Cloud Data Fabric

Cloud Data Management Platform

Data Fabric

Data Management Platform

Data Services Platform

MDM Platform

Real-Time Big Data Platform

Procedure

  1. Expand the FileDelimited connection and browse to the set of columns you want to analyze.
  2. Select the columns to be analyzed, and then click Finish to close this New analysis wizard.
    The analysis editor opens with the defined analysis metadata, and a folder for the newly created analysis is displayed under Analyses in the DQ Repository tree view.
    A sample data is displayed in the Data Preview section and the selected columns are displayed in the Analyzed Column section of the analysis editor.
  3. If required, select another connection from the Connection box in the Analyzed Columns view. This box lists all the connections created in the Studio with the corresponding database names.
    By default, the delimited file connection you have selected in the previous step is displayed in the Connection box.
  4. If required, click the Select columns to analyze link to open a dialog box where you can modify your column selection.
    Note: You can filter the table or column lists by typing the desired text in the Table filter or Column filter fields respectively. The lists will show only the tables/columns that correspond to the text you type in.
  5. In the column list, select the check boxes of the column(s) you want to analyze and click OK to proceed to the next step.
    In this example, you want to analyze a set of six columns in the delimited file: account number (account_num), education (education), email (email), first name (fname), last name (lname) and gender (gender). You want to identify the number of rows, the number of distinct and unique values and the number of duplicates.