tRules - 6.1

Talend Components Reference Guide

EnrichVersion
6.1
EnrichProdName
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Open Studio for Big Data
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend Open Studio for ESB
Talend Open Studio for MDM
Talend Real-Time Big Data Platform
task
Data Governance
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

Warning

This component will be available in the Palette of the studio on the condition that you have subscribed to Talend Studio.

tRules properties

Component family

Processing

 

Function

tRules allows you to apply one or more business rules on a data flow in order to output only relevant data.

Purpose

tRules allows you to use business rules defined in a Drools file of .xls or .drl format in order to filter data.

Basic settings

Property type

Either Built-in or Repository.

Since version 5.6, both the Built-In mode and the Repository mode are available in any of the Talend solutions.

 

 

Built-in: No property data stored centrally.

 

 

Repository: Select the repository file where properties are stored. The fields that come after are pre-filled in using the fetched data.

 

DRL/XLS FILE

Browse to the Drools or Excel file that holds the business rules you want to use in your Job.

You can use business rules stored in external files or create business rules of DRL format from within Talend Studio. The example scenarios of this component represent the two cases.

 

Outputs

Set the business rules to use on the output data.

Schema: Create/select the output schema.

Rule: Create/select the business rule to use on the corresponding output schema.

Note

The import field in the code of a rule file needs to correspond to the current project name.

For more information on creating business rules, see the Drools documentation in the following site: http://www.jboss.org/drools/documentation.html

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Global Variables

ERROR_MESSAGE: the error message generated by the component when an error occurs. This is an After variable and it returns a string. This variable functions only if the Die on error check box is cleared, if the component has this check box.

NB_LINE: the number of rows read by an input component or transferred to an output component. This is an After variable and it returns an integer.

A Flow variable functions during the execution of a component while an After variable functions after the execution of the component.

To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable to use from it.

For further information about variables, see Talend Studio User Guide.

Usage

This component is an intermediate component in a data flow. It needs input and output components.

Limitation

n/a

Scenario 1: Extracting client data according to business rules stored in an external file

This scenario is a four-component Job that aims at reading client data and retrieving only the clients that match business rules stored in an external Drools file.

Prerequisites

For this example, you must have a Drools file of .xls or .drl format that holds the business rules you will use in the Job.

In this example, the business rules are defined in an Excel file as the following:

  • The Import field (cell C:2) must respect the following format <projectname>.<lowercase jobname>_0_1.<jobname>.*. For example, dq_project.business_rule_0_1.Business_Rule.* means that the name of the project in the studio is dq_project and the name of the Job is Business_Rule.

    Make sure to define in this field the exact project and Job names you have in your studio.

  • the RuleAGE rule retrieves all clients whose age is between 30 and 39 and writes them to the first output flow.

  • the RuleREGION rule retrieves all clients who live in the EMEA region and writes them to the second output flow.

    In the output schema of the tRules component, make sure to use the exact names of the rules defined in the Excel Drools file.

Designing the Job and configuring the input data

  1. Drop the following components from the Palette to the design workspace: tFileInputExcel, tRules and two tLogRow.

  2. Double-click tFileInputExcel to display its Basic settings view and define the component properties.

  3. In the Property type list, select Built-in and fill in the fields that follow.

  4. Click the three-dot button next to the File Name/Stream field and browse to the source file to set its path and name. The source file used in this example is called client and it holds client data.

    If needed, right-click tFileInputExcel and select Data viewer to have a preview of the input data.

  5. Select the All sheets check box to retrieve the data from all sheets of the excel file.

  6. From the Schema list, select Built-in and click Edit schema to open a dialog box where you can define the schema of the input file.

    In this example, the source file holds four columns: id, name, age and region.

  7. Click OK to validate your changes and close the dialog box.

Defining the business rules

Setting the rule schemas

  1. In the design workspace, double-click tRules to display its Basic settings view and define the component properties.

  2. In the Property type list, select Repository if you have stored the file that holds the business rules in the Metadata > Rules managementnode of the Repository tree view. If not, select Built-in and browse to the Drools file.

    For more information on how to create and store business rules, see Talend Studio User Guide.

  3. In the Outputs table, click the plus button to add two rows that represent two different output flows, each using one of the two business rules defined in the Drools file.

  4. Click in the first row of the Schema column to display a three-dot button. Click the button to open a dialog box and set a name for the output schema.

  5. Enter the exact name of the first rule as it is written in the Drools file, RuleAGE in this example, and then click OK.

    A dialog box opens.

  6. Define your output schema. In this example, we want to recuperate the input schema. Click OK to close the dialog box.

  7. Do the same in the second line of the Schema column.

    Enter the exact name of the second rule as it is written in the Drools file RuleREGION to have it as the name of the second output schema, and then recuperate the input schema in the open dialog box.

    You will have an error message when trying to execute the Job if the name of the output schemas in your Job do not match the exact names of the rules in the Drools file.

Selecting the rules

  1. Click in the first line of the Rule column to display a three-dot button. Click the button to open the [Rule] dialog box.

  2. Select the option check box that corresponds to your needs:

    • View Rules: to open the business rule file in read-only mode, or

    • Select a rule from repository: to select the relevant predefined rule from the business rules file stored in the Repository tree view.

  3. In the Rule list, select the rule you want to apply to the first output schema, RuleAGE, and then click OK.

    The selected rule displays in the Rules column.

    In this example, we want to apply RuleAGE to the first output schema and RuleREGION to the second output schema.

  4. Do the same to select the rule for the second output schema, RuleREGION and then click OK.

  5. In the design workspace, double-click each of the tLogRow components to define its properties. For more information, see tLogRow.

  6. Save your Job and press F6 to execute it.

    The Run console displays the two output flows: the first output flow lists all clients whose age is between 30 and 39, and the second output flow lists all clients who live in the EMEA region.

Scenario 2: Extracting zip codes using DRL rules you create from the Studio

This scenario is a three-component Job that aims at creating business rules of DRL format from the studio. You can then use these rules to retrieve zip codes for two specific cities you define in the rules.

Creating the DRL rule template

  1. In the Repository tree view, expand Metadata > Rules management.

  2. Right-click Embedded Rules and select Create Rule.

  3. In the open wizard, enter a name for the rule template, fill in its settings as needed and click Next.

  4. Select the Create option and from the Type of rule resource list, select New DRL.

  5. Click Finish.

    A rule template is created and opened in a rule editor in the workspace.

This rule template is embedded in a tRules component. You can define one or several DRL rules in the template from inside the tRules component.

Designing the Job and configuring the input data

  1. Drop a tFixedFlowInput and two tLogRow components from the Palette to the design workspace.

  2. From the Embedded Rules node in the Repository tree view, drop the rule template you created.

    A tRules component with the embedded rule template is displayed on the workspace.

  3. Link tFixedFlowInput to tRule using a Row > Main link.

  4. Double-click tFileInputDelimited to display its Basic settings view and define the component properties.

  5. Click the [...] button next to Edit schema to open the schema editor.

  6. Add two rows using the [+] button, name the rows as zipCode and CityName and click OK.

    When you define the DRL rules, you will use the zipCode column to match the city zip codes and the CityName column to output the name of the city that match the zip code.

    Note

    Make sure to start the column name you will use to match the zip code with lower case, otherwise you will get an error when trying to run the Job.

  7. In the Mode area, select the Use Inline Content (delimited file).

  8. Set the row and field separators, and in the Content table, type in the delimited data on which to apply the DRL rules.

Defining the DRL rules

Setting the rule schemas

  1. In the design workspace, double-click tRules to display its Basic settings view and define the component properties.

    The Property Type is automatically set to Repository as you have already stored the rule template in the Studio.

  2. Click the [...] button to open a dialog box that lists the DRL rules stored locally in the repository.

  3. Select the rule template in which you want to define the rule schemas, ZipCodeRuleSet in this example, and then click OK.

  4. Use the [+] button to add two rows to the Outputs table, click in the Schema column and then click the [...] button.

  5. In the open dialog box, set a name for the first output schema, call it Paris, and click OK.

  6. In the open dialog box, define your output schema. Copy zipCode and CityName from the input flow to the output flow and click OK.

  7. Do the same to create a second output schema, call it Suresnes and similarly copy the two input columns to the output flow.

    Each of the two output schemas will use one of the two DRL rules you will define in the rule template.

  8. Right-click tRules and select Row > Paris to link the component to the first tLogRow.

  9. Do the same and link tRules to the second tLogRow using the Row > Suresnes link.

Creating the DRL rules

  1. In the Outputs table, click in the Rule column and then click the [...] button of the Paris schema.

  2. In the open dialog box, select one of the options as the following:

    Select

    To...

    Edit Rules

    open the rule in the rule editor in the workspace.

    Create a rule with guide

    open a dialog box where you can define a rule in the rule template.

    select a rule from repository

    select a predefined rule from the rule template created and stored in the Repository tree view.

    In this example, select the Create a rule with guide option.

  3. In the open dialog box, use the Drools syntax to set the condition of the "Paris" rule as the following: zipCode matches "75\\d{3}", and then click OK.

    The new "Paris" rule is generated and displayed in the Rule column. This rule retrieves from the Paris schema all zip codes that start with 75 and the three figures that follow.

  4. Click in the Rule column and then click the three-dot button of the "Paris" rule.

    The rule template is opened in the rule editor in the workspace.

  5. In the "Paris" rule, add the code output.CityName = "Paris" to output Paris as the city name in the first output flow.

  6. Repeat the above steps to create a "Suresnes" rule and set its condition as the following zipCode == "92150".

    The new rule is displayed in the Rule column. This rule retrieves from the Suresnes schema all zip codes that are equal to 92150.

  7. In the "Suresnes" rule, add the code output.CityName = "Suresnes" to output Suresnes as the city name in the second output flow.

  8. In the design workspace, double-click each of the tLogRow components to define its properties.

    For more information, see tLogRow.

  9. Save your Job and press F6 to execute it.

    The Run console displays two output flows with zip codes and city names.

    In the first output flow, the "Paris" rule retrieves all zip codes that start with 75 and writes the city name as Paris.

    In the second output flow, the "Suresnes" rule retrieves all zip codes that are equal to 92150 and writes the city name as Suresnes.