Analyzing columns in an excel file - 6.1

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

You can analyze data in an excel file and execute the created analyses using the Java engine. This analysis focuses on a column and analyze each cell in it.

Note

Profiling excel files is done via ODBC for the time being. In later releases, you will be able to analyze excel files directly as you do with delimited files.

Prerequisite(s): At least one connection to an excel file is set in the Profiling perspective of the studio. For further information, see How to connect to an Excel file.

Note

The example below uses the Generic ODBC to connect to the data source. In the current Studio, you can still use ODBC to connect to the Excel file but ODBC works only with Java 7.

To set up an ODBC connection to a Data Source, do the following:

  1. In the DQ Repository tree view, expand Metadata, and then right-click DB connections.

    The connection wizard is displayed.

  2. In the Name field, enter a name for the connection.

    Note

    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  3. If required, fill in a purpose and a description for the connection, and then click Next to proceed to the next step.

  4. From the DB Type list, select Generic ODBC.

  5. In the DataSource field, enter the exact name of the Data Source you created in the previous procedure.

  6. Click the Check button to display a confirmation message about the status of the connection.

  7. If your connection is successful, click OK to close the message, and then click Finish to close the wizard.

  8. The connection is listed under DB connections in the DQ Repository tree view and the connection editor opens in the Studio.

    Note

    You can create a connection to an excel file either from the Profiling or the Integration perspectives. Once created, this connection is always displayed simultaneously in both perspectives.

    Warning

    If you have difficulty retrieving the columns from the excel file, give the worksheet in the excel file the same name of the table. To do that, select the whole table in the excel file and then press Ctrl + F3 and modify the name.

You can now create a column analysis in the Profiling perspective of the studio to profile the columns in the excel file.

The procedures to analyze columns in an excel file are exactly the same as those for analyzing columns in a delimited file. For further information on analyzing columns in an excel files, see Analyzing columns in a delimited file and How to access the detailed view of the analysis results.

Warning

Make sure to select the Java engine in the Analysis Parameter view in the analysis editor before executing the analysis of the excel columns, otherwise you will have an error message when running the analysis.