Scenario 2: Using a custom matching algorithm to match entries - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

In this scenario, reuse the previous Job to load and apply a user-defined matching algorithm.

As a prerequisite, follow the steps described in Creating a custom matching algorithm to manually write a custom algorithm and store it in a .jar file (Java archive). The mydistance.jar file is used here to provide the user-defined matching algorithm, MyDistance.class.

You will also need to use the tLibraryLoad component to import the Java library into the Job.

Setting up the Job

  1. On the previous Job, drop the tLibraryLoad component from the Palette to the Design workspace.

  2. Delete the tLogRow components named possible and none.

  3. Connect the tLibraryLoad component to the tMysqlInput (person) component using a Trigger > On Subjob Ok link.

Configuring the components

  1. Double-click tLibraryLoad to open its Component view.

  2. Click the [...] button and browse to the mydistance.jar file.

  3. Click Windows>Show view... to open the Modules view.

  4. In the Modules view, click and in the open dialog box, browse to the user-defined mydistance.jar file created for this Job.

  5. Click Open.

    The user-defined .jar file is imported and listed in the Modules view.

    You will get an error message if you try to run the Job without installing the external user-defined .jar file.

  6. Double-click tRecordMatching to open its Component view.

  7. In the Key Definition table of this view, click the name row in the Matching Type column and select custom... from the drop-down list.

  8. In the Custom matcher class of this name row, type in the path pointing to MyDistance.class in the mydistance.jar file. In this example, this path is org.talend.mydistance.MyDistance.

Note

When you select a date column on which to apply an algorithm or a matching algorithm, you can decide what to compare in the date format.

For example, if you want to only compare the year in the date, in the component schema set the type of the date column to Date and then enter "yyyy" in the Date Pattern field. The component then converts the date format to a string according to the pattern defined in the schema before starting a string comparison.

Executing the Job

  • Press F6 to run this Job.

    In the Run view, the matched entries are identified and listed as follows: