Solving memory limitation issues in tMap use - 6.3

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.3
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

For Big Data users only:

This feature is not available in the MapReduce version of tMap.

When handling large data sources, including for example, numerous columns, large number of lines or of column types, your system might encounter memory shortage issues that prevent your Job, to complete properly, in particular when using a tMap component for your transformation.

A feature has been added (in Java only for the time being) to the tMap component, in order to reduce the memory in use for lookup loading. In fact, rather than storing the temporary data in the system memory and thus possibly reaching the memory limitation, the Store temp data option allows you to choose to store the temporary data onto a directory of your disk instead.

This feature comes as an option to be selected in the Lookup table of the input data in the Map Editor.

To enable the Store temp data option:

  1. Double-click the tMap component in your Job to launch the Map Editor.

  2. In input area, click the Lookup table describing the temporary data you want to be loaded onto the disk rather than in the memory.

  3. Click the tMap settings button to display the table properties.

  4. Click in the Value field corresponding to Store temp data, and then click the [...] button to display the [Options] dialog box.

  5. In the [Options] dialog box, double-click true, or select it and click OK, to enable the option and close the dialog box.

For this option to be fully activated, you also need to specify the directory on the disk, where the data will be stored, and the buffer size, namely the number of rows of data each temporary file will contain. You can set the temporary storage directory and the buffer size either in the Map Editor or in the tMap component property settings.

To set the temporary storage directory and the buffer size in the Map Editor:

  1. Click the Property Settings button at the top of the input area to display the [Property Settings] dialog box.

  2. In [Property Settings] dialog box, fill the Temp data directory path field with the full path to the directory where the temporary data should be stored.

  3. In the Max buffer size (nr of rows) field, specify the maximum number of rows each temporary file can contain. The default value is 2,000,000.

  4. Click OK to validate the settings and close the [Property Settings] dialog box.

To set the temporary storage directory in the tMap component property settings without opening the Map Editor:

  1. Click the tMap component to select it on the design workspace, and then select the Component tab to show the Basic settings view.

  2. In the Store on disk area, fill the Temp data directory path field with the full path to the directory where the temporary data should be stored.

    Alternatively, you can use a context variable through the Ctrl+Space bar if you have set the variable in a Context group in the repository. For more information about contexts, see Using contexts and variables.

At the end of the subJob, the temporary files are cleared.

This way, you will limit the use of allocated memory per reference data to be written onto temporary files stored on the disk.

Note

As writing the main flow onto the disk requires the data to be sorted, note that the order of the output rows cannot be guaranteed.

On the Advanced settings view, you can also set a buffer size if needed. Simply fill out the field Max buffer size (nb of rows) in order for the data stored on the disk to be split into as many files as needed.