Overview
A Naming Convention is a rule to follow when deciding to name objects like a Job, file, context, variable, code routine, metadata, etc. There are many great reasons for the establishment and adoption of well defined naming conventions, yet all too often the lack of discipline to do so wins out. Bottom line is, conformity breads reliability and thus even a minimalist approach to naming of objects is better than none at all.
"The beginning of wisdom is to call things by their right names." Chinese Proverb
Here are some credible reasons to consider in support of having naming conventions:
- Increases the readability & understanding of source code
- Relationships can be deduced improving understanding of object interactions
- Conforming objects can be easily identified & help determine their purpose
- Reduces duplication of objects & helps avoid collisions on different ones
- Consistency across projects fosters predictability & enables automation
- Enhances source code appearance
Regardless of any debate on what can become a controversial issue over which conventions to use, there is no real industry standard to address all the potential elements involved. It is however, highly recommended that Talend projects in a 'Data Driven Enterprise' define and widely utilize object naming conventions.
Since the Talend product suite contains many areas where naming conventions should be defined the following best practices offer an organized methodology in naming specific objects.
Directories & Files
Directories
Directories | Naming Convention |
---|---|
Application location for full software installation |
WINDOWS:
LINUX:
Examples: Windows > C:\Talend\5.6.1\tac > C:\Talend\5.6.1\studio Linux > /opt/talend/5.6.1/tac > /opt/talend/5.6.1/cmdline> /opt/talend/5.6.1/jobserver |
Archives location for local studio exports |
WINDOWS:
LINUX:
Examples: Windows > C:\Talend\5.6.1\Archives\DEV\ Linux > /opt/talend/5.6.1/archives/DEV/ |
Build Release location for local studio builds |
WINDOWS:
LINUX:
Examples: Windows > C:\Talend\5.6.1\Release\DEV\ Linux > /opt/talend/5.6.1/release/DEV/ |
Data File Root preferred root location for data files & templates |
WINDOWS:
LINUX:
Examples: Windows > D:\Data\MyProject\DEV\ > E:\Templates\MyProject\DEV\ Linux > /data/myProject/DEV/ > /templates/myProject/DEV/ |
I/O location preferred location for data file & templates |
WINDOWS:
LINUX:
Examples: Windows > D:\Data\MyProject\DEV\JSVR01\MyFILES\ > E:\Templates\MyProject\DEV\MyFILES\ Linux > /data/myProject/DEV/jsvr01/MyFILES/ > /templates/MyProject/DEV/MyFILES/ |
Software Root location for base software installation |
WINDOWS:
LINUX:
Examples: Windows > C:\Talend\5.6.1\ Linux > /opt/talend/5.6.1/ |
Workspace location for local studio repository |
WINDOWS:
LINUX:
Examples: Windows > C:\Talend\5.6.1\Workspace\DEV\ Linux > /opt/talend/5.6.1/workspace/DEV/ |
Files
Files | Naming Convention |
---|---|
Data Files any file of any type used for data i/o |
There are many file types that can fall into this naming convention. It is recommended to elaborate on all them individually. Base Example:
> 20150516_mydatadumpfile.out > 20150517_mydataloadfile.xml |
File Templates
sample file of any type to define a schema |
These represent files used to define metadata repository schema's. It is recommended to create these as sample files independent of actual data files. Base Example:
> 20150516_mydatadumpfile.csv > 20150517_mydataloadfile.json |
Projects & Folders
A Talend project contains a collection of jobs, joblets, code, metadata and documentation. A Talend project can become quite big depending on the number of objects within the project.
As a good practice, it is recommended to have multiple projects to group objects of the same concern together.
It is also recommended to define a strategy for having multiple projects so that each project has less that ~200 jobs contained within it.
Object Name
Object Name | Naming Convention |
---|---|
Project |
<department>_<project within department> or <workstream>_<project within stream>
Limit project names to 25 characters maximum. Abbreviate the label <project within department> or <project within stream>. Project name should be in upper case. Examples: ACCOUNTING_DATASYNC ECOMMERCE_FULFILMENT |
MDM Project |
Project name should be upper case and follow the convention. MASTER_DATA_<PROJECT NAME> or MDM_<PROJECT NAME> Examples: MDM_CUSTOMER360 MDM_PRODUCT360 MDM_FINANCE_REF_DATA |
Repository Items
Repository Item | Naming Convention |
---|---|
Folder |
Folders are used to group items of a similar category or behavior. Folder names should be Camel Case, separated by underscores. No whitespace should be allowed. Only use alphanumeric characters, i.e. [Aa..Zz] and [0..9]. A Playpen or Scratchpad should be created in each project under each category of repository items for developers to design temporary jobs. Jobs within the Playpen or Scratchpad can be deleted at any time without any impact. Developers are responsible for deleting their temporary jobs. |
Contexts | Contexts are group of variables defined for various environment.
|
Context Variable |
|
Job | A Job contains the logic of the design.
job_<group>_<sequence>_<description> Where: <group> is a logical grouping or function of the project, e.g. Staging, SFDC, MDM, Sync. It should be abbreviated and limited to less than 10 alphanumeric characters. <sequence> is a 3 digit number unique to the job. The Control Job or Master Job will always have a sequence of 000. All other jobs will have a sequence that describes their place in the schedule as shown in the examples below. When multiple jobs can be executed in parallel, make sure that the sequence are different on each of these jobs. The <sequence> within a group should increment by 10 during initial development to enable additional jobs to be slotted in between in later iterations. The <sequence> provides a sense of the order in which the jobs should be executed within the <group>. <description> should be informative and give an adequate description of the function of the job. Description is Camel Case and no white space is allowed. Examples:
|
Joblet |
A Joblet is a code snipped that will be re-used within a job one or more times. Naming Convention: jlet_<group>_<sequence>_<description> Where the <group>, <sequence> and <description> follows the same conventions as for Job above. A special group name is Global when the joblet being designed is expected to be used across multiple projects. Joblet descriptions must be concise but adequately described to ensure re-usability. Examples:
|
Code Routines |
Each project will consist of zero or more code routines containing Java functions. Code Routines should only contain static Java functions. Naming Convention: cr_<group>_<description> where: <group> will be a logical group to identify all code routines that contain similar functions. <description> will be informative and describe the types of functions present in the package. Examples:
|
globalMap Variables |
globalMap variables enable job specific information to be shared across multiple components. globalMap variables are Camel Case with the first character in lowercase. If a variable is to be used for storing the values of a field from a schema or input row link, then the name should contain a dot and should follow the same case as the link/schema name and field name. Examples:
|
Components and Links |
Component names should be meaningful and should be in Camel Case. Component names should contain alphanumeric characters only and can contain an underscore. Names should also be short (<25 characters) and succinct. Component link names should be meaningful and should be in Camel Case. |
Talend Administration Center - Scheduling Naming Standards
TAC Object | Naming Convention |
---|---|
Task |
Task should be named as follows: task_<name of task> e.g. task_SFDC_controlJob If additional tasks are required for the same control job then add a sequence number to the “task_” prefix, e.g. task_001_SFDC_controlJob.
Note: Task name must be unique within the Talend Administration Center. |
Task Trigger |
The task trigger will be named as follows: tr_<description> e.g. tr_fileCreation If the task name has a sequence, then use the same sequence in the name of the trigger, e.g. tr_001_cronEveryTuesdayMidnight.
Note: Trigger name must be unique within the Talend Administration Center. |
Execution Plan |
Execution plans will be named as follows:
plan_<sequence>_<suitable description> where <sequence> is a 3 digit number. Always try to use meaningful names for the plan. If the plan uses tasks originating from jobs from several projects, then include the name of the projects in the description. e.g. plan_001_downloadSFDCAccounts
Note: Plan name must be unique within the Talend Administration Center. |
Execution Plan Trigger | Apply same rule as Task Trigger above. |
Summary
The adoption of well defined Naming Conventions for Talend objects is a comprehensive discipline that requires leadership, cooperative agreement, and habit. Doing this will greatly improve short term development, testing, and deployment of Talend applications and significantly reduce long term maintenance efforts.
Our hope is that customers use the Object Naming Conventions defined above, however we strongly encourage that naming conventions in any form are established.