Adding a dataset from a database - 8.0

Talend Data Preparation User Guide

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Data Preparation
Content
Data Quality and Preparation > Cleansing data
Last publication date
2024-03-26

Talend Data Preparation is able to connect to various databases and use them as source to create a new dataset.

By default, Talend Data Preparation offers connectivity to the following databases:

  • AthenaDB
  • Azure SQL
  • Derby
  • MariaDB
  • MySQL
  • PostgreSQL
  • Redshift
  • Snowflake
  • SQL Server

The list of drivers used for these preconfigured connections can be found in the <TDP_installation_path>/dataprep/services/tcomp/config/jdbc_config.json file.

In this example, you want to prepare some customers data that is stored on a MySQL database. You will enter your database connection information, directly in the Talend Data Preparation interface and create a new dataset from this data.

Procedure

  1. In the Datasets view of the Talend Data Preparation homepage, click the white arrow next to the Add Dataset button.
  2. Select Database.

    The Add database dataset form opens.

  3. In the Dataset name field, enter the name you want to give your dataset.
  4. In the Database type drop-down list, select the type of database you want to connect to, MySQL in this example.

    This list can be manually enriched. For more information, see "Adding a new database type".

  5. In the JDBC URL field, provide a URL to access your MySQL database.

    The form provides a URL template where you can adapt the values to match your own connection details:

    • Replace localhost with your IP address.
    • Replace 3306 with the port that you have set for MySQL. 3306 is the default port for MySQL.
    • Replace db with the name of the database you want to connect to.
  6. In the Username and Password fields, enter your MySQL connection information.
  7. Click Test connection.
    If the connection is successful, the second part of the form is displayed, where you can enter a query for your database. If not, an error message is displayed, detailing why the connection failed.

    Make sure that MySQL authorizes connection from Talend Data Preparation.

  8. In the Query field, enter the query for the information that you want to retrieve from the table stored in your database.

    In this case, all the information from the table called customers will be retrieved and output as a dataset.

  9. Click the Add dataset button at the end of the form.

Results

The data extracted from the customers table in your MySQL database directly opens in the grid and you can start working on your preparation the same way you usually do.

The data is still stored in the MySQL database, Talend Data Preparation only retrieves a sample on-demand.

The dataset is added to the list in the Datasets view of the homepage.