Setting the input flow in the Map Editor - 6.4

Talend Big Data Platform Studio User Guide

EnrichVersion
6.4
EnrichProdName
Talend Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

The order of the Input tables is essential. The top table reflects the Main flow connection, and for this reason, is given priority for reading and processing through the tMap component.

For this priority reason, you are not allowed to move up or down the Main flow table. This ensures that no Join can be lost.

Although you can use the up and down arrows to interchange Lookup tables order, be aware that the Joins between two lookup tables may then be lost.

Related topic: How to use Explicit Join.

How to fill in Input tables with a schema

To fill in the input tables, you need to define either the schemas of the input components connected to the tMap component on your design workspace, or the input schemas within the Map Editor.

For more information about setting a component schema, see How to define component properties.

For more information about setting an input schema in the Map Editor, see Setting schemas in the Map Editor.

Main and Lookup table content

The order of the Input tables is essential.

The Main Row connection determines the Main flow table content. This input flow is reflected in the first table of the Map Editor's Input panel.

The Lookup connections' content fills in all other (secondary or subordinate) tables which displays below the Main flow table. If you have not define the schema of an input component yet, the input table displays as empty in the Input area.

The key is also retrieved from the schema defined in the Input component. This Key corresponds to the key defined in the input schema where relevant. It has to be distinguished from the hash key that is internally used in the Map Editor, which displays in a different color.

Variables

You can use global or context variables or reuse the variable defined in the Variables area. Press Ctrl+Space bar to access the list of variables. This list gathers together global, context and mapping variables.

The list of variables changes according to the context and grows along new variable creation. Only valid mappable variables in the context show on the list.

Docked at the Variable list, a metadata tip box display to provide information about the selected column.

Related topic: Mapping variables

How to use Explicit Join

Warning

For Big Data users only:

In a MapReduce Job, only one expression key is allowed per mapping component. If you need to use multiple expression keys to join different input tables, use multiple tMap components one after another. For more information about MapReduce Jobs, see Talend Big Data Getting Started Guide.

In fact, Joins let you select data from a table depending upon the data from another table. In the Map Editor context, the data of a Main table and of a Lookup table can be bound together on expression keys. In this case, the order of table does fully make sense.

Simply drop column names from one table to a subordinate one, to create a Join relationship between the two tables. This way, you can retrieve and process data from multiple inputs.

The join displays graphically as a purple link and creates automatically a key that will be used as a hash key to speed up the match search.

You can create direct joins between the main table and lookup tables. But you can also create indirect joins from the main table to a lookup table, via another lookup table. This requires a direct join between one of the Lookup table to the Main one.

Note

You cannot create a Join from a subordinate table towards a superior table in the Input area.

The Expression key field which is filled in with the dragged and dropped data is editable in the input schema, whereas the column name can only be changed from the Schema editor panel.

You can either insert the dragged data into a new entry or replace the existing entries or else concatenate all selected data into one cell.

For further information about possible types of drag and drops, see Mapping the Output setting .

Note

If you have a big number of input tables, you can use the minimize/maximize icon to reduce or restore the table size in the Input area. The Join binding two tables remains visible even though the table is minimized.

Creating a Join automatically assigns a hash key onto the joined field name. The key symbol displays in violet on the input table itself and is removed when the Join between the two tables is removed.

Related topics:

Along with the explicit Join you can select whether you want to filter down to a unique match or if you allow several matches to be taken into account. In this last case, you can choose to consider only the first or the last match or all of them.

To define the match model for an explicit Join:

  1. Click the tMap settings button at the top of the table to which the Join links to display the table properties.

  2. Click in the Value field corresponding to Match Model and then click the three-dot button that appears to open the [Options] dialog box.

  3. In the [Options] dialog box, double-click the wanted match model, or select it and click OK to validate the setting and close the dialog box.

Unique Match

This is the default selection when you implement an explicit Join. This means that only the last match from the Lookup flow will be taken into account and passed on to the output.

The other matches will be then ignored.

First Match

This selection implies that several matches can be expected in the lookup. The First Match selection means that in the lookup only the first encountered match will be taken into account and passed onto the main output flow.

The other matches will then be ignored.

All Matches

This selection implies that several matches can be expected in the lookup flow. In this case, all matches are taken into account and passed on to the main output flow.

How to use Inner Join

Warning

For Big Data users only:

In a MapReduce Job, only one expression key is allowed per mapping component. If you need to use multiple expression keys to join different input tables, use multiple tMap components one after another. For more information about MapReduce Jobs, see Talend Big Data Getting Started Guide.

The Inner join is a particular type of Join that distinguishes itself by the way the rejection is performed.

This option avoids that null values are passed on to the main output flow. It allows also to pass on the rejected data to a specific table called Inner Join Reject table.

If the data searched cannot be retrieved through the explicit Join or the filter Join, in other words, the Inner Join cannot be established for any reason, then the requested data will be rejected to the Output table defined as Inner Join Reject table if any.

Simply drop column names from one table to a subordinate one, to create a Join relationship between the two tables. The Join is displayed graphically as a purple link and creates automatically a key that will be used as a hash key to speed up the match search.

To define the type of an explicit Join:

  1. Click the tMap settings button at the top of the table to which the Join links to display the table properties.

  2. Click in the Value field corresponding to Join Model and then click the three-dot button that appears to open the [Options] dialog box.

  3. In the [Options] dialog box, double-click the wanted Join type, or select it and click OK to validate the setting and close the dialog box.

Note

An Inner Join table should always be coupled to an Inner Join Reject table. For how to define an output table as an Inner Join Reject table, see Lookup Inner Join rejection.

You can also use the filter button to decrease the number of rows to be searched and improve the performance (in Java).

Related topics:

How to use the All Rows option

By default, without a Join set up, in each input table of the input area of the Map Editor, the All rows match model option is selected. This All rows option means that all the rows are loaded from the Lookup flow and searched against the Main flow.

The output corresponds to the Cartesian product of both table (or more tables if need be).

Note

If you create an explicit or an inner Join between two tables, the All rows option is no longer available. You then have to select Unique match, First match or All matches. For more information, see How to use Explicit Join and How to use Inner Join.

How to filter an input flow

Click the Filter button next to the tMap settings button to add a Filter field.

In the Filter field, type in the condition to be applied. This allows to reduce the number of rows parsed against the main flow, enhancing the performance on long and heterogeneous flows.

You can use the Auto-completion tool via the Ctrl+Space bar keystrokes in order to reuse schema columns in the condition statement.

How to remove input entries from table

To remove input entries, click the red cross sign on the Schema Editor of the selected table. Press Ctrl or Shift and click fields for multiple selection to be removed.

Note

If you remove Input entries from the Map Editor schema, this removal also occurs in your component schema definition.