Best Practice: Conventions for Object Naming - 8.0

Version
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Real-Time Big Data Platform
Module
Talend Administration Center
Talend Studio
Content
Administration and Monitoring
Design and Development
Installation and Upgrade

Overview

A Naming Convention is a rule to follow when deciding to name objects like a Job, file, context, variable, code routine, metadata, etc. There are many great reasons for the establishment and adoption of well defined naming conventions, yet all too often the lack of discipline to do so wins out. Bottom line is, conformity breads reliability and thus even a minimalist approach to naming of objects is better than none at all.

"The beginning of wisdom is to call things by their right names." Chinese Proverb

Here are some credible reasons to consider in support of having naming conventions:

  • Increases the readability & understanding of source code
  • Relationships can be deduced improving understanding of object interactions
  • Conforming objects can be easily identified & help determine their purpose
  • Reduces duplication of objects & helps avoid collisions on different ones
  • Consistency across projects fosters predictability & enables automation
  • Enhances source code appearance

Regardless of any debate on what can become a controversial issue over which conventions to use, there is no real industry standard to address all the potential elements involved. It is however, highly recommended that Talend projects in a 'Data Driven Enterprise' define and widely utilize object naming conventions.

Since the Talend product suite contains many areas where naming conventions should be defined the following best practices offer an organized methodology in naming specific objects.

Directories & Files

Perhaps the most universal and common use of naming conventions surround storage; directory structures and the files contained within them.
Note: Careful planning, cooperative discipline, and enterprise-wide agreements on all directory and file names should be established and adopted (whenever possible) BEFORE the development life cycle begins. Refactoring later, and/or managing exceptions to these names can be problematic and compound long term difficulties.

Directories

Directories Naming Convention

Application

location for full software

installation

WINDOWS:

{root}\{app}

LINUX:

{root}/{app}
  • root = software root
  • app = talend application (cmdline/jobserver/runtime/studio/tac/others)

Examples:

Windows

> C:\Talend\5.6.1\tac

> C:\Talend\5.6.1\studio

Linux

> /opt/talend/5.6.1/tac

> /opt/talend/5.6.1/cmdline

> /opt/talend/5.6.1/jobserver

Archives

location for local studio

exports

WINDOWS:

{root}\Archives\{env}

LINUX:

{root}/archives/{env}
  • root = software root; (can be located on a different root from Talend apps)
  • env = environment (SBX/DEV/TEST/UAT/PROD)

Examples:

Windows

> C:\Talend\5.6.1\Archives\DEV\

Linux

> /opt/talend/5.6.1/archives/DEV/

Build Release

location for local studio

builds

WINDOWS:

{root}\Release\{env}

LINUX:

{root}/release/{env}
  • root = software root; (can be located on a different root from Talend apps)
  • env = environment (SBX/DEV/TEST/UAT/PROD)

Examples:

Windows

> C:\Talend\5.6.1\Release\DEV\

Linux

> /opt/talend/5.6.1/release/DEV/

Data File Root

preferred root location for

data files & templates

WINDOWS:

{drv}:\Data\{prj}\{env}\
{drv}:\Templates\{prj}\{env}\

LINUX:

/data/{prj}/{env}/
/templates/{prj}/{env}/
  • drv = disk drive letter
  • prj = project name; (additional sub-folders are allowed)
  • env = environment (SBX/DEV/TEST/UAT/PROD)

Examples:

Windows

> D:\Data\MyProject\DEV\

> E:\Templates\MyProject\DEV\

Linux

> /data/myProject/DEV/

> /templates/myProject/DEV/

I/O location

preferred location for

data file & templates

WINDOWS:

{droot}\{jsvr}\{udef}\
{troot}\{udef}\

LINUX:

{droot}/{jsvr}/{udef}/
{troot}/{udef}/
  • droot = data file root (may be located on a different server)
  • troot = template file root (may be located on a different server)
  • jsvr = job server (optional; helpful when non-shared storage is used)
  • udef = user defined folder(s); additional sub-folders are allowed

Examples:

Windows

> D:\Data\MyProject\DEV\JSVR01\MyFILES\

> E:\Templates\MyProject\DEV\MyFILES\

Linux

> /data/myProject/DEV/jsvr01/MyFILES/

> /templates/MyProject/DEV/MyFILES/

Software Root

location for base software

installation

WINDOWS:

{drv}:\Talend\{ver}\

LINUX:

/opt/talend/{ver}/
  • drv = disk drive letter
  • ver = talend version number

Examples:

Windows

> C:\Talend\5.6.1\

Linux

> /opt/talend/5.6.1/

Workspace

location for local studio

repository

WINDOWS:

{root}\Workspace\{env}

LINUX:

{root}/workspace/{env}
  • drv = disk drive letter
  • env = environment (SBX/DEV/TEST/UAT/PROD)

Examples:

Windows

> C:\Talend\5.6.1\Workspace\DEV\

Linux

> /opt/talend/5.6.1/workspace/DEV/

Files

Files Naming Convention

Data Files

any file of any type

used for data i/o

There are many file types that can fall into this naming convention.

It is recommended to elaborate on all them individually. Base Example:

{yyyymmdd}_{filename}.{ext}
  • yyyymmdd = best method for sorting file by date
  • filename = any file name you want (underscores preferred)
  • ext = any suitable file name extension (.out/.xml/.json/.html/others)
Examples:

> 20150516_mydatadumpfile.out

> 20150517_mydataloadfile.xml

File Templates

sample file of any type

to define a schema

These represent files used to define metadata repository schema's.

It is recommended to create these as sample files independent of actual data

files. Base Example:

{yyyymmdd}_{filename}.{ext}
  • yyyymmdd = best method for sorting file by date
  • filename = any file name you want (underscores preferred)
  • ext = any suitable file name extension (.out/.xml/.json/.html/others)
Examples:

> 20150516_mydatadumpfile.csv

> 20150517_mydataloadfile.json

Projects & Folders

A Talend project contains a collection of jobs, joblets, code, metadata and documentation. A Talend project can become quite big depending on the number of objects within the project.

As a good practice, it is recommended to have multiple projects to group objects of the same concern together.

It is also recommended to define a strategy for having multiple projects so that each project has less that ~200 jobs contained within it.

Object Name

Object Name Naming Convention
Project

<department>_<project within department>

or

<workstream>_<project within stream>

Limit project names to 25 characters maximum. Abbreviate the label <project within department> or <project within stream>.

Project name should be in upper case.

Examples:

ACCOUNTING_DATASYNC

ECOMMERCE_FULFILMENT

MDM Project

Project name should be upper case and follow the convention.

MASTER_DATA_<PROJECT NAME>

or

MDM_<PROJECT NAME>

Examples:

MDM_CUSTOMER360

MDM_PRODUCT360

MDM_FINANCE_REF_DATA

Repository Items

Repository Item Naming Convention
Folder

Folders are used to group items of a similar category or behavior.

Folder names should be Camel Case, separated by underscores. No whitespace should be allowed. Only use alphanumeric characters, i.e. [Aa..Zz] and [0..9].

A Playpen or Scratchpad should be created in each project under each category of repository items for developers to design temporary jobs. Jobs within the Playpen or Scratchpad can be deleted at any time without any impact. Developers are responsible for deleting their temporary jobs.

Contexts Contexts are group of variables defined for various environment.
  • Database Context Group should be created from the Metadata DB Connection definition. This is done by creating a new DB Connection metadata object with the right settings. During the creation of the metadata object, click on the button to Export Context. These database contexts should be left unchanged, i.e. do not rename the context variables or add new context variables to these contexts.
  • Additional contexts should be created for project specific requirements. Normally you should limit the number of additional contexts you create. Limit to < 3 new additional contexts per project. For example, you can have a Common context group.
Context Variable
  • Database context variables should be left untouched.
  • Non-database context group variables should follow Camel Case with the first character in lower case. 

    For example: path, folder, customerId.

  • Context variable must be descriptive. 
  • Avoid 1 character context variable, for example a, b, c, i, j.
Job A Job contains the logic of the design.
  • Job should have meaningful name.
  • Limit Job name to less than 50 characters long.
Naming Convention:

job_<group>_<sequence>_<description>

Where:

<group> is a logical grouping or function of the project, e.g. Staging, SFDC, MDM, Sync. It should be abbreviated and limited to less than 10 alphanumeric characters.

<sequence> is a 3 digit number unique to the job. The Control Job or Master Job will always have a sequence of 000. All other jobs will have a sequence that describes their place in the schedule as shown in the examples below. When multiple jobs can be executed in parallel, make sure that the sequence are different on each of these jobs.

The <sequence> within a group should increment by 10 during initial development to enable additional jobs to be slotted in between in later iterations. The <sequence> provides a sense of the order in which the jobs should be executed within the <group>.

<description> should be informative and give an adequate description of the function of the job.  Description is Camel Case and no white space is allowed.

Examples:
  • job_Report_000_controlJob
  • job_Report_010_fileReception
  • job_Staging_000_controlJob
  • job_Staging_010_stageTheFinanceFile
  • job_SFDC_000_controlJob
  • job_SFDC_010_checkPickListValues
  • job_SFDC_020_insertAccountRecord
Joblet

A Joblet is a code snipped that will be re-used within a job one or more times.

Naming Convention:

jlet_<group>_<sequence>_<description>

Where the <group>, <sequence> and <description> follows the same conventions as for Job above.

A special group name is Global when the joblet being designed is expected to be used across multiple projects.

Joblet descriptions must be concise but adequately described to ensure re-usability.

Examples:
  • jlet_Global_000_logErrorToFile
  • jlet_SFDC_000_establishConnection
  • jlet_SFDC_010_closeConnection
Code Routines

Each project will consist of zero or more code routines containing Java functions.

Code Routines should only contain static Java functions.

Naming Convention:

cr_<group>_<description>

where:

<group> will be a logical group to identify all code routines that contain similar functions.

<description> will be informative and describe the types of functions present in the package.

Examples:
  • cr_dwh_dateValidation
  • cr_dwh_arrayHandler
  • cr_dwh_textFormat 
globalMap Variables

globalMap variables enable job specific information to be shared across multiple components. globalMap variables are Camel Case with the first character in lowercase.

If a variable is to be used for storing the values of a field from a schema or input row link, then the name should contain a dot and should follow the same case as the link/schema name and field name.

Examples:
  • customerNumber
  • countOfInputRecords
  • customer.id (customer is the name of the input row, and id is the field name)
  • transaction.transaction_code (transaction is the name of the input row, and transactioncode is the field name)
Components and Links

Component names should be meaningful and should be in Camel Case. Component names should contain alphanumeric characters only and can contain an underscore. Names should also be short (<25 characters) and succinct.

Component link names should be meaningful and should be in Camel Case.

Talend Administration Center - Scheduling Naming Standards

TAC Object Naming Convention
Task

Task should be named as follows:

task_<name of task>

e.g. task_SFDC_controlJob

If additional tasks are required for the same control job then add a sequence number to the “task_” prefix, e.g. task_001_SFDC_controlJob.

Note: Task name must be unique within the Talend Administration Center.

Task Trigger

The task trigger will be named as follows:

tr_<description>

e.g. tr_fileCreation

If the task name has a sequence, then use the same sequence in the name of the trigger, e.g. tr_001_cronEveryTuesdayMidnight.

Note: Trigger name must be unique within the Talend Administration Center.

Execution Plan

Execution plans will be named as follows:

plan_<sequence>_<suitable description>

where <sequence> is a 3 digit number. Always try to use meaningful names for the plan. If the plan uses tasks originating from jobs from several projects, then include the name of the projects in the description.

e.g. plan_001_downloadSFDCAccounts

Note: Plan name must be unique within the Talend Administration Center.

Execution Plan Trigger Apply same rule as Task Trigger above.

Summary

The adoption of well defined Naming Conventions for Talend objects is a comprehensive discipline that requires leadership, cooperative agreement, and habit. Doing this will greatly improve short term development, testing, and deployment of Talend applications and significantly reduce long term maintenance efforts.

Our hope is that customers use the Object Naming Conventions defined above, however we strongly encourage that naming conventions in any form are established.