Converting and filtering records with Talend Data Mapper - 7.3

author
Talend Documentation Team
EnrichVersion
7.3
EnrichProdName
Talend Big Data Platform
Talend Data Fabric
Talend Data Management Platform
Talend Data Services Platform
Talend MDM Platform
Talend Real-Time Big Data Platform
task
Design and Development
EnrichPlatform
Talend Studio

Converting and filtering records with Talend Data Mapper

This article explains how to convert data types and filter invalid records using a map.

Talend Data Mapper is a powerful mechanism for processing records. A common operation during records processing is to validate data and reject or adjust records with invalid values.

This article uses the attached readings.xml file as an example. It is structured as follows:
<readings>
	<reading>
		<Timestamp>1/5/2019 2:14 AM</Timestamp>
		<AmbientTempF>44.8</AmbientTempF>
		<ProbeTempF>33</ProbeTempF>
		<Humidity>74</Humidity>
	</reading>
	<reading>
		<Timestamp>1/5/2019 2:04 AM</Timestamp>
		<AmbientTempF>45</AmbientTempF>
		<ProbeTempF>N/C</ProbeTempF>
		<Humidity>73</Humidity>
	</reading>
</readings>

In some of the records, the value in the ProbeTempF element is N/C, which is invalid. This article explains how configure a map to split valid and invalid values into different files and create a third file in which the invalid values are replaced by a valid one to avoid errors.

The map created in this example also converts String data types into more appropriate types, such as Date/Time or Double.

The map is then used in a simple data integration Job.

Creating structures

Creating the input structure

Create the XML input structure from an existing file.

About this task

This procedure uses the file readings.xml available in the Downloads tab on this page.

Procedure

  1. Open the Mapping perspective of Talend Studio.
  2. In the Data Mapper tab, expand the Hierarchical Mapper node and right-click Structures.
  3. Click New > Structure.
  4. In the dialog box that opens, select Import a structure definition and click Next.
  5. Select XML Sample Document and click Next.
  6. Select Local file and click Browse to find readings.xml, then click Next.
  7. Enter a name for the structure, readings_input for example, then click Next and Finish.

Results

Your structure is created and opens in the editor.

Creating the output structure

Create the XML output structure using the existing input structure.

About this task

This output structure is split into three elements that each contain the same elements as the input structure. This will allow the map to filter the results and write them to three different output files.

Procedure

  1. In the Data Mapper tab, expand the Hierarchical Mapper node and right-click Structures.
  2. Click New > Structure.
  3. Select Create a new structure where you manually enter elements and click Next.
  4. Enter a name for your structure, readings_output for example.
  5. Select the XML representation and click Finish.
    Your structure is created and opens in the editor.
  6. Right-click the left section of the editor and click New Element, then enter a name for the element, root for example.
  7. Right-click root and click New Element to add an element named valid.
  8. In the properties of the valid element, scroll down to the Inherits From property and click the […] button to the right of the field.
  9. Expand you project and select Structures > readings_input.
    The elements in the input structure are added under the valid element.
  10. Right-click the valid element and click Copy.
  11. Right-click the root element and click Paste. Repeat this step to create a third element under root.
  12. Click valid[2] and valid[3] and edit their names to invalid and cleansed, respectively.
  13. Press Ctrl + S to save your structure.

Results

Your output structure is created and should look like this:

Creating the map

Before you begin

You have created your input and output structures

Procedure

  1. In the Data Mapper tab, right-click Maps and click New > Map.
  2. Select Standard Map and click Next.
  3. Select a folder and enter a name for your map, readings_map for example, then click Finish.
    Your map is created and opens in the editor.
  4. Drag and drop the readings_input structure from the Hierarchical Mapper to the Input section of the editor, and the readings_output structure to the Output section.
  5. Select the reading (1:*) element in the input structure, and drag and drop it on the three reading (1:*) elements in the output structure.
    All elements contained in the reading loops are automatically mapped.

Results

Your elements are mapped, you can now edit the mapping to convert and filter your data.

Converting data types

Convert strings into more meaningful data types.

About this task

To convert the AmbientTempF and ProbeTempF to Double and Humidity to Integer, you simply need to change the data types in the output structure.

Procedure

  1. In the Data Mapper tab, double-click the readings_output structure to open it in the editor.
  2. Click the Read Only drop-down list and select Editable to modify the elements in the structure.
  3. Click AmbientTempF in the valid element and select Double (64) in the Data Type drop-down list. Repeat this step for the other occurrences of AmbientTempF.
  4. Click Humidity in the valid element and select Integer (32) in the Data Type drop-down list. Repeat this step for the other occurrences of Humidity.
  5. Click ProbeTempF in the valid element and select Double (64) in the Data Type drop-down list. Repeat this step for ProbeTempF in the cleansed element.
    The invalid element will contain all rows in which the value of ProbTempF is N/C. The data type should be kept as String for this element.
  6. Press Ctrl + S to save your structure.

Converting a string to a date

Convert a String data type to a Date/Time type with a specific format.

Procedure

  1. In the readings_output structure, click the Timestamp element and select Date/Time in the Data Type field. Do this for every occurrence of the Timestamp element in the structure.
  2. Save the structure and open readings_map.
  3. Select the three Timestamp elements, right-click them and click Remove Expressions (Mappings).
  4. In the Functions tab of the left panel, select the ParseDateTime function and drag and drop it on one of the Timestamp elements in the output structure.
  5. Select the Timestamp element in the input structure and drag and drop it on the Value argument under the ParseDateTime function.
  6. Double-click the ParseDateTime function to open its properties and enter the pattern for your date, M/D/Y h:m a for example, then click OK.

    Example

  7. Right-click the ParseDateTime function and click Copy, then paste it in the Value tab of the other two Timestamp elements in the output structure.
  8. Press Ctrl + S to save the map.

Results

Your map is now configured to convert strings into meaningful data types.

Filtering records

Use filters to separate your data into different output files.

About this task

In the input file, some ProbeTempF elements have N/C as a value. This value is invalid and the records containing it need to be separated from the valid records.

Procedure

  1. In readings_map, click the reading element under invalid.
  2. Drag and drop an Equal function from the Functions tab on the Filter argument under the SimpleLoop function.
  3. Drag the ProbeTempF element from the input structure on the First Value argument.
  4. Drag a Constant function on the Second Value argument.
  5. Double-click the Constant function and enter N/C in the Value field.
    The invalid element is now configured to contain only records in which the ProbeTempF value is N/C.
  6. Click the reading element under valid.
  7. Drag and drop a NotEqual function from the Functions tab on the Filter argument under the SimpleLoop function.
  8. Drag the ProbeTempF element from the input structure on the First Value argument.
  9. Drag a Constant function on the Second Value argument and set its value as N/C.
    The valid element is now configured to filter out records in which the ProbeTempF value is N/C.
  10. Save your map.

Replacing invalid values

Replace invalid values with a placeholder valid value to avoid errors.

Procedure

  1. Right-click ProbeTempF in the cleansed element and click Remove Expressions (Mappings).
  2. Drag and drop an IfThenElse function in its Value tab.