Talend Data Preparation allows a direct
connection to various types of databases. You can use them as source to create new
datasets.
It is possible to manually enrich the list of databases from which you can import
data.
The list of available database types for dataset creation actually depends on the JDBC
drivers that you have stored in the
<components_catalog_path>/.m2 folder.
Let's say that you have some customer data stored on an Oracle database, and you want to
import it in Talend Data Preparation to perform
cleansing operations. You will add a JDBC driver .jar file specific
to Oracle databases to the Components Catalog folder
structure to add this new source of data in the Talend Data Preparation interface.
You do not need to stop or restart any of the services to complete the following
procedure.
Before you begin
The Components Catalog
server is installed and running on a Linux machine.
Procedure
-
Download the latest Oracle jdbc driver called ojdbc7.jar
from the Oracle website.
-
Create the
<components_catalog_path>/.m2/jdbc-drivers/oracle/7/
folder.
Warning: The folder structure must follow this template:
.m2/jdbc-drivers/<database_name>/<jdbc_version>.
-
Copy the ojdbc7.jar in the newly created folder.
-
Change the name of the file from ojdbc7.jar to
oracle-7.jar.
Warning: The file name must follow this template:
<database_name>-<jdbc_version>.
The purpose of renaming the .jar file and the folder
structure, is to ensure naming consistency and make them Maven
compliant.
-
Update the
<components_catalog_path>/config/jdbc_config.json file
by adding the following lines:
,
{
"id" : "Oracle Thin",
"class" : "oracle.jdbc.driver.OracleDriver",
"url" : "jdbc:oracle:thin:@myhost:1521:thedb",
"paths" :
[
{"path" : "mvn:jdbc-drivers/oracle/7"}
]
}
Where:
-
id
is the value that will be displayed in the Talend Data Preparation interface as Database type.
-
class
is the driver class used to communicate with the
database.
-
url
is the URL template to access a database.
-
path
follows this model:
mvn:jdbc-drivers/my_databse_name/my_version
In the case where the database configuration requires more than one
.jar file, rename them according to the template
mentioned earlier, and add them to their dedicated
.m2/jdbc-drivers/<jar_name>/<jdbc_version>
folder like you did for the Oracle driver. For a database that would need
two .jar files for example, you would end up with the
two following files:
.m2/jdbc-drivers/<jar_1>/<version>/<jar_name_1>-<version>.jar
and
.m2/jdbc-drivers/<jar_2>/<version>/<jar_name_2>-<version>.jar
To finish the configuration, update the
<components_catalog_path>/config/jdbc_config.json
file using the following model:
,
{
"id" : "Database_type",
"class" : "<driver_class>",
"url" : "<url_to_access_database>",
"paths" :
[
{"path" : "mvn:jdbc-drivers/jar_1/version"},
{"path" : "mvn:jdbc-drivers/jar_2/version"}
]
}
Results
The Oracle database is now available in the
database
type drop-down list in the import form.