Best Practices: Using Git with Talend 6.2 and later versions
Both Subversion (SVN) and Git are supported in the subscription edition of the Talend products.
For more information about how to use both Suversion and Git in the Talend Administration Center, you can read Can I use both Git and Subversion within the same Talend Administration Center (TAC).
The screenshot below shows a Subversion project selected. You can easily identify that it is a Subversion project since it shows trunk in the Branch list. There is not merge functionality in Subversion.
Working with Git Local Branches
Talend has added a new dropdown menu for Git functionalities as shown in the screenshot below. This menu only appears if you are logged into a Git project.
Once connected to the master branch, it is a good practice to create a new local branch. The screenshot below shows that you have a developer_local branch created and you are currently working in this local branch.
At this point, the studio is still connected to the Talend Administration Center for license management (especially allowing concurrent users through) and projects authorisation.
However, all jobs, joblets, etc. will be saved locally in the workspace. These artefacts only get saved and committed to Git when you do a Push. A developer can also get the latest version from the master and merge into his current local branch by selecting the Pull And Merge Branch menu.
As the developer create more local branches, it is easy to switch to other local branches without restarting the studio. The developer can also delete a local branch, after doing a Push, to clean up his/her workspace. Note that you will lose your changes/edits if you delete your local branch without doing a Push first.
General Git Best Practices
Use Branching and Tagging
One of the most important Best Practices with Git is to use Branching and Tagging correctly. There are a number of articles describing this in depth, but for best practice, a number of simple steps should be followed. Branching is standard usage, but tagging is very important and should be used correctly.
Remember GIT allow multiple developers to work on the same project by committing/pushing and then retrieving their changes to/from the Git server. Branching allows developers to work independently without affecting the main development line. This is called the master. A ‘branch’ is a copy of a project taken at a specific point in time. A copy is taken from the main development line (the 'master'), from another Branch or from a Tag.
Tags are used by developers to mark a particular revision as important in the development process and their use is a very good best practice.
Use Git with a Centralised Workflow
The Centralized Workflow uses a central repository to serve as the single point-of-entry for all changes. Instead of trunk, the default development branch is called master and all changes are committed into this branch. This workflow doesn’t require any other branches besides the master.
Developers clone the central repository. In their own local copies of the project, they edit files and commit changes. These new commits are stored locally. To publish changes to the official project, developers push their local ‘master branch’ to the central repository.
Make Regular Backups
With Git, every clone is basically a backup. Backups should still be done though because clones do not save git configurations. Having your files backed up on a remote server is a good side effect of having a version control system.
Divide Work into Repositories
Repositories are often used for storing things that they really shouldn't. This is because they are there, are available and are easily accessible. This is not good practice. With Git you can group things together using Git submodules.
Use Commit Messages
Use descriptive commit messages. It allows colleagues to understand changes without having to read code.
git commit -m "<message>"
Use a Security Model
Without a security model, everyone will have to access everything? This may be OK or may not be.
A good idea is to limit access so that only certain repositories have read/write access for everyone. Git allows users to set up different types of access control. You may even consider creating a centralized git master repository with tools such as Gitlite Manager.
Make use of Standards
Using standards, such as naming standards, will improve the quality of your commits and the code-base.
It is best practice to make use of them. Other standards to use are ones surrounding testing, syntax, commit message analysis, etc.
Use of External Tools
There are a number of useful external tools that integrate with Git.
However, Talend Studio already provides all the functionalities you need to leverage Talend with Git. On rare occasions, you may need to use external Git tools, but this is not generally a best practice with Talend. Talend Studio should be used to manage Talend projects stored in the Git repository.
Managing release workflow is a valid best practice. Ensure versions are tagged and named according to your naming standards.
Maintain your Repositories
Your repository is only as good as the files kept in there. Old code, missing objects etc. just cause confusion. It is a good idea to do periodic maintenance on repositories. There are a number of useful Git commands to help you. The most useful ones are:
- Validate and check the integrity of your
- Compact your repository:
git gc --aggressive
- Prune remote tracking
git remote update --prune
- Check your stash for unused/old
git stash list
Try to avoid committing large binary files to Git. There are a couple of Git utilities that you can use if you have to commit large binary files. They are Git annex or Git media.
Large file usage is an actively discussed topic within the external Git community.
Wherever possible, try not to create large repositories in Git. Git can be slow when large repositories exist. Now, ‘Large’ depends on definition, but in general it depends upon factors such as RAM size, I/O speed, etc.
However, having many files, say 100K-200K, in a repository will slow common operations due to system call speeds. In addition, having many large files, as discussed above, can slow many operations.