Testing a rule set - 7.1

Talend Real-time Big Data Platform Studio User Guide

English (United States)
Talend Real-Time Big Data Platform
Talend Studio
Design and Development
This section uses an example to present the details about how to test a set of parser rules.

Before you begin

You must know how to create a set of parser rules and how to access the corresponding test view in the Talend Studio main window. For further information, see Creating a set of parser rules and Accessing the rule test view.

Note: If you need to import sample rules, you can do this using the tStandardizeRow component in an existing Job, like the products_parsing Job in the standardization_examples > product directory provided by the Data Quality Demos project in your Talend Studio. For further information, see the tStandardizeRow documentation in the Talend Components Reference Guide.

About this task

In this example, the rules to be tested are as follows:




"LengthUnit" "Enumeration" " 'm' | '\'' | 'inch' | 'inches' | '\"'"
"by" "Enumeration" "'X' | 'x' | 'by' "
"length" "Format" "(INT | FRACTION | DECIMAL) LengthUnit "
"Size" "Combination" "length by length"
"WeightUnit" "Enumeration" " 'lb' | 'lbs' | 'pounds' | 'Kg' | 'pinds'"
"weight" "Format" "(INT | FRACTION | DECIMAL) WeightUnit "


  1. In the rule list on the upper-left corner of the Interpreter test view, click the rule element. This means that you need to test the whole set of rules.
  2. In the data sample box docked on the upper part of the test view, type in a piece of data sample.
    In this example, it is 34-9923, Monolithic Membrane 6125; four by eight sheet, 26 lbs 26 lbs. This data describes a merchandise.
  3. Click the save button in the upper-right corner of this data sample area to save this test case and type in a name in the Save test case dialog box, for example, SKU.
  4. Click OK.
    This test case is displayed in the test-case list on the lower-left corner. The Interpreter test view should look like the following:
  5. Click the button on the upper-right corner to run this test. Once done, the test result is displayed on the lower part of this view.
    From this result, you can easily find where you can improve the given rules. The data four by eight sheet represents a size but it is not matched up to the corresponding rule. So you can consider to add new rules or modify the existing rules. Both ways are contextual and no one is necessarily better than the other. In this example, we add an Enumeration rule and modify the length and the LengthUnit rules to improve the matching exactness.
    Name Type Value
    "length" "Format" "(INT | FRACTION | DECIMAL ) LengthUnit | Number LengthUnit?"
    "Number" "Enumeration" "'four' | 'eight' "
    "LengthUnit" "Enumeration" " 'm' | '\'' | 'inch' | 'sheet' | 'inches' | '\"' "
    The new length rule means that four or eight with or without a length unit could be matched.
    Note: To update these rules, you have to grasp the ANTLR grammar and the ANTLR symbols used to write a rule. For further information, see Talend Components Reference Guide, and for further information, check the ANTLR's web site.
  6. Click the save button beneath the rule table to refresh the test view and re-generate the parser code. The data sample area and the test result area become empty.
  7. In the test-case list on the down-left corner, select the SKU data sample you have saved earlier.
  8. Click the play button again in the upper-right corner to run this test. Once done, the new test result is displayed in the corresponding area:
    From this result, you can see that the four by eight sheet data has been matched up to the Size rule of the Combination type.

    The test view does not present the name of any Combination rules as this type allows the repetition of rule names. In the ANTLR Grammar tab view, names of the Combination rules, not always unique, are not generated as code in order to avoid duplicate errors. The following figure shows the code corresponding to this example: the name Size is always a literal value between quotation marks without its equivalent code element while the format rules SKU and length have their equivalent code elements sku and length. For further information about ANTLR grammar, see ANTLR web site.

    If required, you can continue improving these rules by using more data samples. The results are always open-ended and this test view allows you to compose the rules that best fulfill your needs.