tPigMap interface - 6.3

Talend Real-time Big Data Platform Studio User Guide

Talend Real-Time Big Data Platform
Data Quality and Preparation
Design and Development
Talend Studio

Pig is a platform using a scripting language to express data flows. It programs step-by-step operations to transform data using Pig Latin, name of the language used by Pig.

tPigMap is an advanced component that maps input flows and output flows being handled in a Pig process (an array of Pig components). Therefore, it requires tPigLoad to read data from the source system and tPigStoreResult to write data in a given target. Starting from this basic design composed of tPigLoad, tPigMap and tPigStoreResult, you can visually develop a Pig process with a wide range of complexity by using the other Pig components around tPigMap. As these components generate Pig code, the Job developed is thus optimized for the Hadoop environment.

You need to use a map editor to configure tPigMap. This Map Editor is an "all-in-one" tool allowing you to define all parameters needed to map, transform and route your data flows via a convenient graphical interface.

You can minimize and restore the Map Editor and all tables in the Map Editor using the window icons.

The Map Editor is made of several panels:

  • The Input panel is the top left panel on the editor. It offers a graphical representation of all (main and lookup) incoming data flows. The data are gathered in various columns of input tables. Note that the table name reflects the main or lookup row from the Job design on the design workspace.

  • The Output panel is the top right panel on the editor. It allows mapping data and fields from input tables to the appropriate output rows.

  • The Search panel is the top central panel. It allows you to search in the editor for columns or expressions that contain the text you enter in the Find field.

  • The UDF panel, located beneath the search panel, allows you to define Pig User-Defined Functions (UDFs) to be loaded by the connected input component(s) and applied to specific output data. For more information, see Defining a Pig UDF using the UDF panel.

  • Both bottom panels are the input and output schemas description. The Schema editor tab offers a schema view of all columns of input and output tables in selection in their respective panel.

  • Expression editor is the editing tool for all expression keys of input/output data or filtering conditions.

The name of input/output tables in the Map Editor reflects the name of the incoming and outgoing flows (row connections).

This Map Editor stays the way a typical Talend mapping component's map editor, such as tMap's, is designed and used. Therefore, in order for you to understand fully how a classic mapping component works, we recommend reading as reference the chapter describing how Talend Studio maps data flows, in Mapping data flows.

Talend also provides a MapReduce and a Spark version of tMap to map big data flows in Talend MapReduce or Spark Jobs. These versions of tMap have almost the same user interface as the standard version of tMap.

For the details about each tMap version, see Talend Components Reference Guide.