Defining the processing component - 6.5

Talend Job Script Reference Guide

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Design and Development > Designing Jobs
EnrichPlatform
Talend CommandLine
Talend Studio

Follow the steps below to define a tMap component to:

  • perform automatic type conversion between input and output to prevent compiling errors at Job execution

  • combine the first name and last name of each person

Procedure

  1. Enter the following functions and parameters to add the component.
    addComponent {
    	setComponentDefinition {
    		TYPE: "tMap",
    		NAME: "tMap_1",
    		POSITION: 480, 256
    	}
    }
  2. Next to the setComponentDefinition {} function, enter the setSettings {} function to define the mapper settings.

    In this example, the data type of the id and age columns is String in the input schemas while it's Integer in the output schema. Enable the automatic type conversion feature of the component and leave the other settings as default.

    	setSettings {
    		ENABLE_AUTO_CONVERT_TYPE : "true"
    	}
  3. Next to the setSettings {} function, enter an addSchema {} function to define the data structure expected by the next component.

    In this example, the output flow is named out and it contains four columns:

    • id, type Integer, two characters long

    • full_name, type String

    • age, type Integer, two characters long

    • city, type String

    	addSchema {
    		NAME: "out",
    		CONNECTOR: "FLOW",
    		LABEL: "out"
    		addColumn {
    			NAME: "id",
    			TYPE: "id_Integer",
    			LENGTH: 2
    		}
    		addColumn {
    			NAME: "full_name",
    			TYPE: "id_String"
    		}
    		addColumn {
    			NAME: "age",
    			TYPE: "id_Integer",
    			LENGTH: 2
    		}
    		addColumn {
    			NAME: "city",
    			TYPE: "id_String"
    		}
    	}
  4. Next to the addSchema {} function, enter the addMapperData {} function to define the mapping data, which includes input, output, and var tables, joins, and mappings.
  5. In the addMapperData {} function, enter an addInputTable {} function to define the input table for the main input flow.

    Node that the column definitions must be the same as those for the first tFileInputDelimited component.

    	addMapperData {
    		addInputTable {
    			NAME: "row1"
    			addColumn {
    				NAME: "id",
    				TYPE: "id_String"
    			}
    			addColumn {
    				NAME: "name",
    				TYPE: "id_String"
    			}
    			addColumn {
    				NAME: "age",
    				TYPE: "id_String"
    			}
    			addColumn {
    				NAME: "city",
    				TYPE: "id_String"
    			}
    		}
    	}
  6. In the addMapperData {} function, enter another addInputTable {} function to define the input table for the lookup flow.

    Node that the column definitions must be the same as those for the second tFileInputDelimited component.

  7. In the definition for the id column, enter the parameter EXPRESSION: "row1.id" to set up a join between the two input tables on the id column.

    Note that this example defines a Left Outer Join. To define an Inner Join, add the ISINNERJOIN: true parameter in the addInputTable {} function.

    		addInputTable {
    			NAME: "row2"
    			addColumn {
    				NAME: "id",
    				TYPE: "id_String"
    				EXPRESSION: "row1.id"
    			}
    			addColumn {
    				NAME: "family",
    				TYPE: "id_String"
    			}
    		}
  8. In the addMapperData {} function, enter an addOutputTable {} function and define the only output table in this example.

    The column definitions must be the same as those defined in the schema settings. Note that the ID parameter is required, but it needs a value only when the output table uses a Repository schema.

  9. Create mappings between the input and output columns by adding the EXPRESSION parameter to each output column.

    Note that the full_name column is a combination of the name column of the main input flow and the family column of the lookup flow, with a space in between.

    		addOutputTable {
    			ID: "",
    			NAME: "out"
    			addColumn {
    				NAME: "id",
    				TYPE: "id_Integer",
    				EXPRESSION: "row1.id"
    			}
    			addColumn {
    				NAME: "full_name",
    				TYPE: "id_String",
    				EXPRESSION: "row1.name  + \" \" + row2.family"
    			}
    			addColumn {
    				NAME: "age",
    				TYPE: "id_Integer",
    				EXPRESSION: "row1.age"
    			}
    			addColumn {
    				NAME: "city",
    				TYPE: "id_String",
    				EXPRESSION: "row1.city "
    			}
    		}
    Warning:

    Be sure to use a backslash (\) when specifying a metacharacter.