Best Practices: Using Git with Talend

author
Irshad Burtally
EnrichVersion
6.5
EnrichProdName
Talend Data Fabric
Talend Real-Time Big Data Platform
Talend Data Services Platform
Talend MDM Platform
Talend Big Data
Talend Big Data Platform
Talend ESB
Talend Data Integration
Talend Data Management Platform
task
Administration and Monitoring > Managing versions
EnrichPlatform
Talend Administration Center
Talend Studio

Best Practices: Using Git with Talend

Talend introduced Git and Github support for the first time in December 2015 for Talend 6.1.1 release. This article describes some best practices on how you should use Git with Talend Administration Center and Talend Studio.

We will also cover some current limitations in Talend 6.1.x regarding Git support and what we can expect from Talend in future versions.

Overview

Both Subversion (SVN) and Git are supported in the Enterprise and Platform products. You can configure both SVN and Git at the same time in the Talend Administration Center Configuration Menu as shown in the screenshot below.
Note: However, it is important to note that both configurations are optional as we can have project of type None for storage in Talend Administration Center.
We will cover the differences in the directory structure below. But as you can see above, both are pointing to a repository. We can then create a project either using SVN, Git or None as the storage. The options when creating a project are shown in the screenshot below.
The screenshot below shows 3 projects. Each project is of a different storage type as highlighted below.
Note: A storage setting of None means that there is no object stored in any SCM. The project is just a label or placeholder for managing access to tasks and task deployment onto job servers through Project Authorizations and Server Project Authorizations.
You should make sure that the Users you have created in the Talend Administration Center have their SVN and Git login details captured, as shown below.
In the Talend Studio, the UI remains mostly the same. The difference is only on the General tab of the Job where instead of SVN History, we will see Git History, as shown below. Also Talend Studio uses the terminology of the SCM everywhere appropriately, i.e. trunk for SVN and master for Git.

In summary, Talend makes it easy to use both Subversion and Git. The main differences are in the way the two SCM behave and the workflow needed when using them.

Differences Between Subversion and Git with Talend

Subversion repositories are similar to Git repositories, but there are several differences when it comes to the architecture of your projects. The following article What are the differences between SVN and Git? describes the differences in the directory structure and workflow between SVN and Git. Development teams are expected to define the workflow they want to use with Git and Talend while taking into considerations the features and limitations described below.

The main difference between SVN and Git is the way you work:

  • SVN: create branches once a release is ready. In Talend, a branch is created only for the project you want/specify in Talend Administration Center.
  • Git: Create branches for each development (bug, new feature) and then merge it to the master and release branch. In Talend, a branch is created on the repository. Hence, the branch is available on all projects within that repository. A tag created on a Git project will appear as a Release in Git as shown below.
The screenshot below shows branch b3 created on Project2 in Talend Administration Center:
This branch b3 is made available to all projects in Git as shown below. Also note the number of branches and releases. There are 3 branches which corresponds to master, b1 and b3. There is only 1 release which correspond to the tagb1b2 we created in Talend Administration Center.

Features and Limitations

The following section lists some of the features and limitations you need to be aware when deciding to use Git with Talend 6.1.1.

Features:

  • Talend support a Git backend with the exact same workflow as for SVN.
  • Branches created are directly related to the whole Git repository (means a branch created for a specific project will be available for all others projects on the same Git repository).

Limitations:

  • No graphical job comparison available in 6.1.x in order to support a pure Git workflow. Talend is working towards introducing a graphical job comparison in future versions.
  • A branch created for a specific project will be available for all others projects in the same Git repository.
  • There is currently an issue with MDM projects and MDM artifacts in Git repository. You will need a patch on top of 6.1.1 to be able to use Git to store MDM artifacts in your MDM projects.

Best Practices

Talend recommends using Git with a centralized workflow as described here https://www.atlassian.com/git/tutorials/comparing-workflows/centralized-workflow with one caveat. This caveat is that the users should not clone the repository as the comparison feature for jobs is still missing in 6.1.1. Users should use the central repository and leverage the Talend Administration Center locking mechanism to identify which jobs are locked and being edited, similar to the SVN workflow.

Branching in a repository makes that branch available to all projects. Hence, developers should be more careful about how to leverage branches for across projects. This can initially create confusion and it is important to spend time to define the correct development workflow.