What is a Talend component

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Open Studio for Data Integration
Talend Open Studio for MDM
Talend Open Studio for Big Data
Talend Data Fabric
Talend Open Studio for ESB
Talend Data Management Platform
Talend Big Data Platform
Talend ESB
Talend Data Integration
Talend Real-Time Big Data Platform
Talend Data Services Platform
Talend MDM Platform
task
Design and Development > Designing Components
EnrichPlatform
Talend Studio

What is a Talend component

This article describes in detail what is a component or connector in Talend Studio. It shows basically how the 800+ connectors provided natively within your Talend Studio are constructed.

This article covers the following topics which are essential if you want to develop your own components:

  • What is a Talend component?
  • What makes it a Talend component?
  • Component code generation model.

Note that this article aims at explaining what Talend components are, and its reading is recommended before you start creating your own component. However, it is not a tutorial teaching you step by step how to develop custom components. For a tutorial and detail explanation on custom component creation, see How to create a custom component.

What is a component

Basically a component is a functional piece that performs a single operation. For example, tMysqlInput extracts data from a MySQL table, tFilterRow filters data based on a condition.

Physically, a component is a set of files stored within a folder named after the component name. All native components are located in the <Talend Studio installation dir>/plugins/org.talend.designer.components.localprovider_[version]/components directory. Each component is a sub-folder under this directory, the folder name is the component name.

Graphically, a component is an icon that you can drag and drop from the Palette to the workspace.

Technically, a component is a snippet of generated Java code that is part of a Job which is a Java class. A Job is made of one or more components or connectors. The job name will be the class name and each component in a job will be translated to a snippet of generated Java code. The Java code will be compiled automatically when you save the job.

What makes it a Talend component

A component usually consists of the following files: an XML descriptor file, a messages properties file, some Java template files, an icon and some jar files that are imported and used by the component.

The XML descriptor file

Each component contains an XML descriptor file. This XML file provides information defining the component: what are the component attributes, what is the component supposed to do and how it interacts with other components, etc.

For example, the structure of the tFileInputDelimited_java.xml descriptor file looks like this:

<COMPONENT>
    <HEADER PLATEFORM="ALL" SERIAL="" VERSION="0.102" STATUS="ALPHA" COMPATIBILITY="ALL"
        AUTHOR="Talend" RELEASE_DATE="20070111A" STARTABLE="true" HAS_CONDITIONAL_OUTPUTS="true">
        <SIGNATURE/>
    </HEADER>
    
    <FAMILIES>
        <FAMILY>File/Input</FAMILY>
    </FAMILIES>
    
    <DOCUMENTATION>
        <URL/>
    </DOCUMENTATION>
    
    <CONNECTORS>
        <CONNECTOR CTYPE="FLOW" MAX_INPUT="0" MAX_OUTPUT="1"/>
        <CONNECTOR NAME="REJECT" CTYPE="FLOW" MAX_INPUT="0" MAX_OUTPUT="1" LINE_STYLE="2"
            COLOR="FF0000" BASE_SCHEMA="FLOW"/>
        <CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="1"/>
        <CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1"/>
        <CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1"/>
        <CONNECTOR CTYPE="COMPONENT_OK"/>
        <CONNECTOR CTYPE="COMPONENT_ERROR"/>
        <CONNECTOR CTYPE="RUN_IF"/>
    </CONNECTORS>
    
    <PARAMETERS>
        <PARAMETER NAME="PROPERTY" FIELD="PROPERTY_TYPE" SHOW="true" NUM_ROW="10"
            REPOSITORY_VALUE="DELIMITED"/>
        <PARAMETER NAME="FILENAMETEXT" FIELD="LABEL" COLOR="0;0;0" NUM_ROW="15">
            <DEFAULT>"When the input source is a stream or a zip file,footer and random shouldn't be
                bigger than 0."</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="FILENAME" FIELD="FILE" NUM_ROW="20" REQUIRED="true"
            REPOSITORY_VALUE="FILE_PATH">
            <DEFAULT>"__COMP_DEFAULT_FILE_DIR__/in.csv"</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="ROWSEPARATOR" FIELD="TEXT" NUM_ROW="30" REPOSITORY_VALUE="ROW_SEPARATOR"
            SHOW_IF="CSV_OPTION=='false'">
            <DEFAULT>"\n"</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="CSVROWSEPARATOR" FIELD="OPENED_LIST" NUM_ROW="30"
            REPOSITORY_VALUE="ROW_SEPARATOR" MAX_LENGTH="2" SHOW_IF="CSV_OPTION=='true'">
            <ITEMS DEFAULT="LF">
                <ITEM NAME="LF" VALUE="&quot;\n&quot;"/>
                <ITEM NAME="CR" VALUE="&quot;\r&quot;"/>
                <ITEM NAME="CRLF" VALUE="&quot;\r\n&quot;"/>
            </ITEMS>
        </PARAMETER>
        <PARAMETER NAME="FIELDSEPARATOR" FIELD="TEXT" NUM_ROW="30" REQUIRED="true"
            REPOSITORY_VALUE="FIELD_SEPARATOR">
            <DEFAULT>";"</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="CSV_OPTION" FIELD="CHECK" REQUIRED="true" REPOSITORY_VALUE="CSV_OPTION"
            NUM_ROW="35">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="ESCAPE_CHAR" FIELD="TEXT" NUM_ROW="35" REQUIRED="true"
            REPOSITORY_VALUE="ESCAPE_CHAR" SHOW_IF="CSV_OPTION == 'true'">
            <DEFAULT>"""</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="TEXT_ENCLOSURE" FIELD="TEXT" NUM_ROW="35" REQUIRED="true"
            REPOSITORY_VALUE="TEXT_ENCLOSURE" SHOW_IF="CSV_OPTION == 'true'">
            <DEFAULT>"""</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="HEADER" FIELD="TEXT" NUM_ROW="40" REPOSITORY_VALUE="HEADER">
            <DEFAULT>0</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="FOOTER" FIELD="TEXT" NUM_ROW="40" REPOSITORY_VALUE="FOOTER"
            SHOW_IF="UNCOMPRESS=='false'">
            <DEFAULT>0</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="LIMIT" FIELD="TEXT" NUM_ROW="40" REPOSITORY_VALUE="LIMIT">
            <DEFAULT/>
        </PARAMETER>
        <PARAMETER NAME="REMOVE_EMPTY_ROW" FIELD="CHECK" REQUIRED="true" NUM_ROW="46"
            REPOSITORY_VALUE="REMOVE_EMPTY_ROW">
            <DEFAULT>true</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="UNCOMPRESS" FIELD="CHECK" NUM_ROW="46">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="DIE_ON_ERROR" FIELD="CHECK" NUM_ROW="46">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="SCHEMA" FIELD="SCHEMA_TYPE" REQUIRED="true" NUM_ROW="44">
            <DEFAULT/>
        </PARAMETER>
        <PARAMETER NAME="SCHEMA_REJECT" FIELD="SCHEMA_TYPE" REQUIRED="true" NUM_ROW="44"
            CONTEXT="REJECT" SHOW="true">
            <TABLE READONLY="true">
                <COLUMN NAME="errorCode" TYPE="id_String" LENGTH="255" READONLY="false"
                    CUSTOM="true"/>
                <COLUMN NAME="errorMessage" TYPE="id_String" LENGTH="255" READONLY="false"
                    CUSTOM="true"/>
            </TABLE>
        </PARAMETER>
    </PARAMETERS>

    <ADVANCED_PARAMETERS>
        <PARAMETER FIELD="DIRECTORY" NAME="TEMP_DIR" NUM_ROW="1" READONLY="false" REQUIRED="true"
            SHOW="false">
            <DEFAULT>"__COMP_DEFAULT_FILE_DIR__"</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="ADVANCED_SEPARATOR" FIELD="CHECK" REQUIRED="true" NUM_ROW="41">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="THOUSANDS_SEPARATOR" FIELD="TEXT" REQUIRED="true" NUM_ROW="41"
            SHOW_IF="(ADVANCED_SEPARATOR == 'true')">
            <DEFAULT>","</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="DECIMAL_SEPARATOR" FIELD="TEXT" REQUIRED="true" NUM_ROW="41"
            SHOW_IF="(ADVANCED_SEPARATOR == 'true')">
            <DEFAULT>"."</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="RANDOM" FIELD="CHECK" REQUIRED="true" NUM_ROW="45"
            SHOW_IF="(CSV_OPTION == 'false') AND (UNCOMPRESS=='false')">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="NB_RANDOM" FIELD="TEXT" REQUIRED="true" NUM_ROW="45"
            SHOW_IF="(CSV_OPTION == 'false') and (RANDOM == 'true') AND (UNCOMPRESS=='false')">
            <DEFAULT>10</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="TRIMALL" FIELD="CHECK" NUM_ROW="46">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="TRIMSELECT" FIELD="TABLE" NUM_ROW="47" NB_LINES="5"
            SHOW_IF="TRIMALL=='false'">
            <ITEMS BASED_ON_SCHEMA="true">
                <ITEM NAME="TRIM" FIELD="CHECK"/>
            </ITEMS>
        </PARAMETER>
        <PARAMETER NAME="CHECK_FIELDS_NUM" FIELD="CHECK" NUM_ROW="46">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="CHECK_DATE" FIELD="CHECK" NUM_ROW="46">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="ENCODING" FIELD="ENCODING_TYPE" NUM_ROW="45" REQUIRED="true"
            REPOSITORY_VALUE="ENCODING">
            <DEFAULT>"ISO-8859-15"</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="SPLITRECORD" FIELD="CHECK" REQUIRED="true" NUM_ROW="50"
            SHOW_IF="CSV_OPTION == 'false'" REPOSITORY_VALUE="SPLITRECORD">
            <DEFAULT>false</DEFAULT>
        </PARAMETER>
        <PARAMETER NAME="DESTINATION" FIELD="TEXT" NUM_ROW="90" SHOW="false">
            <DEFAULT/>
        </PARAMETER>
    </ADVANCED_PARAMETERS>
    
    <CODEGENERATION>
        <IMPORTS>
            <IMPORT NAME="Talen File Enhanced" MODULE="talend_file_enhanced_20070724.jar"
                REQUIRED="true"/>
            <IMPORT NAME="Talend_CSV" MODULE="talendcsv.jar" REQUIRED="true"/>
        </IMPORTS>
    </CODEGENERATION>
    <RETURNS>
        <RETURN NAME="NB_LINE" TYPE="id_Integer" AVAILABILITY="AFTER"/>
    </RETURNS>
</COMPONENT>

The HEADER element defines the basic information about the component, such as the version, the author of the component, etc.

<HEADER PLATEFORM="ALL" SERIAL="" VERSION="0.102" STATUS="ALPHA"
        COMPATIBILITY="ALL" AUTHOR="Talend" RELEASE_DATE="20070111A"
        STARTABLE="true" HAS_CONDITIONAL_OUTPUTS="true">
        <SIGNATURE />
</HEADER>

The STARTABLE attribute specifies whether the component can be the first component of a Subjob. It is always set to true for input components, such as tMysqlInput, tFileInputDelimited, and it is set to false for output components, such as tMysqlOutput.

The HAS_CONDITIONAL_OUTPUTS attribute specifies whether the component has multiple output data flows. For example, if you want the component to have both main and reject connections, this attribute needs to be set to true.

The FAMILIES element specifies the family group(s) where the component should be put in the Palette. A component can be put in several groups, in System and Orchestration groups for example:

<FAMILIES>
    <FAMILY>System</FAMILY>
    <FAMILY>Orchestration</FAMILY>
</FAMILIES>

The CONNECTORS element defines the type of connection or link the component uses to connect to other component(s) in the Job. This defines how this component will interact with other components.

<CONNECTORS>
	<CONNECTOR CTYPE="FLOW" MAX_INPUT="0" MAX_OUTPUT="1"/>
	<CONNECTOR NAME="REJECT" CTYPE="FLOW" MAX_INPUT="0"
		MAX_OUTPUT="1" LINE_STYLE="2" COLOR="FF0000" BASE_SCHEMA="FLOW" />
	<CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="1" />
	<CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" />
	<CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" />
	<CONNECTOR CTYPE="COMPONENT_OK" />
	<CONNECTOR CTYPE="COMPONENT_ERROR" />
	<CONNECTOR CTYPE="RUN_IF" />
</CONNECTORS>

The different types of accepted triggers or flow connectors show on the contextual menu of the component:

The CONNECTOR CTYPE attribute defines the connector type.

The FLOW type means it can handle a Main or a Reject data flow. The MAX_INPUT attribute defines the maximum number of allowed input connectors linked to this component, and the MAX_OUTPUT attribute defines the maximum number of allowed output connectors that the component can connect to. For example:

<CONNECTOR CTYPE="FLOW" MAX_INPUT="0" MAX_OUTPUT="1"/>

In this example, the component does not allow any Main data flow linked to it and sends only one Main data flow to another component.

The PARAMETERS element defines the component properties. The PARAMETER NAME attribute defines the property name, the PARAMETER FIELD attribute defines the field type and thus determines the type of value that can be set.

<PARAMETERS>
     <PARAMETER NAME="PROPERTY" FIELD="PROPERTY_TYPE" SHOW="true"
         NUM_ROW="10" REPOSITORY_VALUE="DELIMITED" />
 
......
      
     <PARAMETER NAME="SCHEMA_REJECT" FIELD="SCHEMA_TYPE"
         REQUIRED="true" NUM_ROW="44" CONTEXT="REJECT" SHOW="true">
         <TABLE READONLY="true">
             <COLUMN NAME="errorCode" TYPE="id_String" LENGTH="255"
                 READONLY="false" CUSTOM="true" />
             <COLUMN NAME="errorMessage" TYPE="id_String"
                 LENGTH="255" READONLY="false" CUSTOM="true" />
         </TABLE>
     </PARAMETER>
</PARAMETERS>

Those are displayed in the Basic Settings view of the component.

The component attributes depend on the component function and on the information required to be input by the user to let the component work as expected. As the information may vary from one user to another, this information can be stored in a variable as a component attribute.

The ADVANCED_PARAMETERS element defines the component advanced properties.

<ADVANCED_PARAMETERS>
    <PARAMETER
      FIELD="DIRECTORY"
      NAME="TEMP_DIR"
      NUM_ROW="1"
      READONLY="false"
      REQUIRED="true"
      SHOW="false">
      <DEFAULT>"__COMP_DEFAULT_FILE_DIR__"</DEFAULT>
    </PARAMETER>
     
......
    <PARAMETER
      NAME="DESTINATION"
      FIELD="TEXT"
      NUM_ROW="90"
      SHOW="false">
        <DEFAULT></DEFAULT>
    </PARAMETER>     
</ADVANCED_PARAMETERS>

It is displayed in the Advanced Settings panel of the component.

The CODEGENERATION element declares the jar files which will be used in the component. These jar files should be placed in the component folder.

<CODEGENERATION>
    <IMPORTS>
        <IMPORT NAME="Talend File Enhanced"
            MODULE="talend_file_enhanced_20070724.jar" REQUIRED="true" />
        <IMPORT NAME="Talend_CSV" MODULE="talendcsv.jar"
            REQUIRED="true" />
    </IMPORTS>
</CODEGENERATION>

The RETURNS element defines the global variables returned by the component. The RETURN NAME attribute defines the variable name, the TYPE attribute defines the data type of variable and the AVAILABILITY attribute defines the usage of the variable. The AVAILABILITY attribute can have AFTER or FLOW as a value.

<RETURNS>
        <RETURN NAME="NB_LINE" TYPE="id_Integer" AVAILABILITY="AFTER" />
</RETURNS>

The global variables can be used in other components in the same Subjob or in other Subjobs depending on the AVAILABILITY attribute definition.

The most common global variable is NB_LINE, which is usually used to count the total number of processed lines.

The messages properties file

Each component must have a messages.properties file, which defines the labels that will be displayed in the Component properties panel for all the variables that can be used.

The messages properties file tFileInputDelimited_messages.properties for tFileInputDelimitedlooks like this:

#Created by JInto - www.guh-software.de
#Mon Aug 24 17:23:58 CST 2009
ADVANCED_SEPARATOR.NAME=Advanced separator(for number)
CHECK_FIELDS_NUM.NAME=Check each row structure against schema
CHECK_DATE.NAME=Check date
CSV_OPTION.NAME=CSV options
DECIMAL_SEPARATOR.NAME=Decimal separator
DIE_ON_ERROR.NAME=Die on error
ENCODING.NAME=Encoding
ESCAPE_CHAR.NAME=Escape char
FIELDSEPARATOR.NAME=Field Separator
FILENAME.NAME=File Name/Input Stream
FOOTER.NAME=Footer
HEADER.NAME=Header
#HELP=org.talend.help.tFileInputDelimited
LIMIT.NAME=Limit
LONG_NAME=Reads a file row by row with simple separated fields
NB_LINE.NAME=Number of lines
NB_RANDOM.NAME=Number of lines
RANDOM.NAME=Extract lines at random
REJECT.LINK=Reject
REJECT.MENU=Reject

The messages.properties file also contains a short description for the component such as:

LONG_NAME=Reads a file row by row with simple separated fields

This label shows as a tooltip of the component in the Palette, when you mouse over a component in the Palette.

The Java template file

There are usually three JET (Java Emitter Templates) files in the component folder, which correspond to the three parts of component Java code.
  • <component_name>_begin.javajet
  • <component_name>_main.javajet
  • <component_name>_end.javajet

For more information regarding the component code generation model, see Component code generation model.

JET is a "model to text" engine which allows you to generate (text) output based on an EMF model. For example you can generate SQL, Java, XML, Text, HTML, etc. JET uses a template technology which is very closely related to the syntax of Java Server Pages (JSP).

In Talend Studio, the template files are translated to Java code. A template file contains two different types of Java code: the template code and the Java output code. The template code is included in <% %> tags. As for JSP pages, on the top part of the templates file, the required classes are imported. They can vary according to your needs. However, there are some default ones we usually add to each javajet template file.

<%@ jet
    imports="
        org.talend.core.model.process.INode
        org.talend.core.model.process.ElementParameterParser
        org.talend.core.model.metadata.IMetadataTable
        org.talend.core.model.metadata.IMetadataColumn
        org.talend.core.model.process.IConnection
        org.talend.core.model.process.IConnectionCategory
        org.talend.designer.codegen.config.CodeGeneratorArgument
        org.talend.core.model.metadata.types.JavaTypesManager
        org.talend.core.model.metadata.types.JavaType
        java.util.List
        java.util.Map      
    "
%>

This is a particular section of the JET template: the JET imports section. You will need it in all the template files since they are compiled separately. The classes listed in the import section are Talend specific and are documented in javadoc format here: http://talendforge.org/tos-2.0.0/api/.

Below are the lines present in each javajet templates file after the JET import section.

<%
    CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument;
    INode node = (INode)codeGenArgument.getArgument();
    String cid = node.getUniqueName(); 
%>

These codes are used to retrieve the values of defined properties of the component and get the unique name of the component. Each component in a Job is a Node whose interface is INode. cid is the component identifier and it is a unique name.

Logically, a component can be used many times in a Job and you cannot predict the name of the variables used in the other components or even how many times the same component will be used, how they are interconnected etc. Talend provides a safe way to make variable unique: adding the unique name of the component as a suffix to each variable declared in the template file. For example:

int nb_line_<%=cid %> = 0;

tFileInputDelimited_1 is the unique name of the tFileInputDelimited component in the Job below:

This line is translated to the Java code as below in the Job generated code:

int nb_line_tFileInputDelimited_1 = 0;

Here is a piece code of a template file:

<%
public void useShareConnection(INode node) {
    String sharedConnectionName = ElementParameterParser.getValue(node, "__SHARED_CONNECTION_NAME__");
%>
    String sharedConnectionName_<%=cid%> = <%=sharedConnectionName%>;
    conn_<%=cid%> = SharedDBConnection.getDBConnection("<%=this.getDirverClassName(node)%>",url_<%=cid%>,userName_<%=cid%> , password_<%=cid%> , sharedConnectionName_<%=cid%>);
<%
}
%>

The template code is included between <% %> tags, and the Java output code, that will really be present in the Job generated code and be executed, is not surrounded by <% %> tags. In this case, the following two lines are the Java output code:

String sharedConnectionName_<%=cid%> = <%=sharedConnectionName%>;
conn_<%=cid%> = SharedDBConnection.getDBConnection("<%=this.getDirverClassName(node)%>",url_<%=cid%>,userName_<%=cid%> , password_<%=cid%> , sharedConnectionName_<%=cid%>);

At the end of the template, we have to close the opened blocks:

<%
}
%>

The component icon

Each component must contain a picture with size 32*32, png type.

The picture is the component icon that is displayed in the Palette. The icon name should comply with the following format: yourComponentName_icon32.png.