Creating a Job to match data - 6.2

Talend Data Services Platform Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend Data Services Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

You can generate a Job to match data in a specific file in the Studio metadata against data in another data source. Using the settings of the components of this automatically-generated Job, you can choose to output the exact match and nonmatch values in separate files or in a database. However, you can choose to output possible matches in a file, a database or in the stewardship console. To do this, you must configure the Talend Data Stewardship Console web application.

For more information about data resolution, see the Talend Data Stewardship Console User Guide.

The sequence of matching data against a lookup file involves the following steps:

  1. Selecting the file that holds the data you want to match.

  2. Choosing the columns on which to run the match Job.

  3. If required, defining a blocking key to partition the data to be processed. A blocking key is usually needed when there is a lot of data in the file.

  4. Choosing where to write the exact match, possible match and nonmatch records.

  5. Running the generated Job.

To generate a Job that identifies and stores exact match, possible match and nonmatch values, do the following:

  1. On the menu bar, select Window > Show View .

  2. The [Show View] dialog box is displayed.

  3. Expand the Help folder and then select Cheat Sheets.

  4. Click OK to close the dialog box.

    The Cheat Sheet panel is displayed in the Studio.

  5. On the cheat sheet icon bar, click the drop-down arrow, and from the contextual menu select Launch Other....

    The [Cheat Sheet Selection] dialog box is displayed.

  6. Expand Talend - Cheat Sheets > Job and select Match Data , and then click OK to close the dialog box.

    The corresponding page is displayed in the Cheat Sheet panel. This page guides you through the steps on how to create a ready-to-use Job on certain columns in a specific file

  7. Read the introduction and then click Click to Begin.

    This will expand the first step in the procedure: Select Input File.

  8. Read the instructions and then click Click to perform.

    A wizard is displayed to guide you through the steps of creating the Job.

  9. From the Type list field, select the file type on which you want to run the Job. Click OK to close the first step in the Wizard. The next step in the cheat sheet is expanded.

    A dialog box opens showing the database and file connections defined in the Studio.

  10. Select the file to cleanse from the metadata connection and click OK.

    The next step in the cheat sheet is expanded.

  11. Read the instructions on how to choose the lookup data source against which you want to match the data and then click Click to perform to open the next view in the wizard.

  12. Continue following the instructions and switching between the wizard and the steps in the cheat sheet page till you come to the last step: Review and Run the Generated Job. The wizard configures all the components and the metadata in the repository according to the settings you defined in the wizard different views and generates the Job. The Studio switches to the Integration perspective. The result Job should look something like the following:

  13. Save the Job and press F6 to run it.

    The exact match, possible match and nonmatch values in the file are identified and stored in the defined output files or database. The generated Job is saved under the Job Designs node in the Repository tree view