Connecting to a file - 6.1

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.1
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Before proceeding to analyze data in a delimited file or an excel file, you must first set up the connection to such a file.

How to connect to a delimited file

Before being able to profile data in a delimited file, you must first set up the connection to this file.

Prerequisite(s): You have selected the Profiling perspective in the studio.

To create a connection to a delimited file, do the following:

  1. Expand the Metadata folder.

  2. Right-click FileDelimited connections and then select Create File Delimited Connection to open the [New Delimited File] wizard.

  3. Follow the steps defined in the wizard to create a connection to a delimited file.

    You can then create a column analysis and drop the columns to analyze from the delimited file metadata in the DQ Repository tree view to the open analysis editor. For further information, see Analyzing columns in a delimited file.

    For information on how to set up a connection to a database, see Connecting to a database.

You can create a file delimited connection either from the Profiling or the Integration perspectives. Once created, this connection is always listed in both perspectives.

You can export your connection as a context and centralize it under the Context node in the Integration perspective of your Studio. This enables you to reuse this context in the data quality analyses that use the current connection. You can also create different context parameters for the same connection and later choose to execute an analysis on one specific context. For further information, see Using context variables to connect to data sources.

How to connect to an Excel file

Before being able to profile data in an excel file, you must create your Data Source, and then set up the connection to this Data Source.

Prerequisite(s): You have selected the Profiling perspective in the studio.

Note

The example below uses the Generic ODBC to connect to the data source. In the current Studio, you can still use ODBC to connect to the Excel file but ODBC works only with Java 7.

To create the Data Source, do the following:

  1. On the task bar of your desktop, click the Start button and then select Control Panel to open the corresponding page.

  2. Double-click Tools and Administrator to open the corresponding page.

  3. Double-click Data sources (ODBC).

    A dialog box opens.

  4. In the User DSN view, click Add... to open a dialog box where you can select the ODBC driver, Microsoft Excel in this example, for the data source (database) to which you want to connect.

  5. Click Finish to proceed to the step where you can define the Data Source.

  6. In the Data Source Name field, enter a name for the Data Source, and then click the Select Workbook... tab to proceed to the step where you link this Data Source to the excel file you want to profile.

  7. In the open dialog box, browse to the excel file to which you want to link your Data Source.

    Note

    To be able to set an ODBC connection to the Data Source without problems, make sure that the excel files you want to profile are put in a folder, that is to say they are not located on the root directory of your system.

  8. Select the excel file and then click OK to close the open dialog boxes. The Data Source you create is listed in the User Data Sources list.

  9. Click OK to close the dialog box.

You can then create a column analysis and drop the columns to analyze from the excel file metadata in the DQ Repository tree view to the open analysis editor. For further information, see Analyzing columns in an excel file.

For information on how to set up a connection to a database, see Connecting to a database.