Managing user-defined indicators - 6.2

Talend Real-time Big Data Platform Studio User Guide

EnrichVersion
6.2
EnrichProdName
Talend Real-Time Big Data Platform
task
Data Quality and Preparation
Design and Development
EnrichPlatform
Talend Studio

User-defined indicators, as their name indicates, are indicators created by the user himself/herself. You can use these indicators to analyzed columns through a simple drag-and-drop operation from the DQ Repository tree view to the columns listed in the editor.

The management options available for user-defined indicators include: create, export and import, edit and duplicate.

How to create SQL user-defined indicators

You can create your own personalized indicators from the Profiling perspective of the studio.

Note

Management processes for user-defined indicators are the same as those for system indicators.

Prerequisite(s): You have selected the Profiling perspective in the studio.

Defining the indicator

  1. In the DQ Repository tree view, expand Libraries > Indicators.

  2. Right-click User Defined Indicators.

  3. Select New Indicator from the contextual menu.

    The [New Indicator] wizard is displayed.

  4. In the Name field, enter a name for the indicator you want to create.

    Note

    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

    If required, set other metadata (purpose, description and author name) in the corresponding fields and click Finish.

    The indicator editor opens displaying the metadata of the user-defined indicator.

Setting the indicator definition and category

  1. Click Indicator Category and select from the list a category for the indicator.

    The selected category determines the columns expected in the result set of the analysis that uses the user-defined indicator.

    The table below explains available categories.

    Indicator category

    Description

    Expected query results

    User Defined Match

    Evaluates the number of data matching a condition.

    The result set should have one row and two columns. The first column contains the number of values that match and the second column the total count.

    User Defined Frequency

    Evaluates the frequency of records using user-defined indicators for each distinct record.

    The result set should have 0 or more rows and two columns. The first column contains a value and the second the frequency (count) of this value.

    User Defined Real Value

    Evaluates real function of data.

    The result set should have one row and one column that contain a real value.

    User Defined Count (by-default category)

    Analyzes the quantity of records and returns a row count.

    The result set should have one row and one column that contain the row count.
  2. Click Indicator Definition and then click the [+] button.

  3. From the Database list, select a database on which to use the indicator.

    If the indicator is simple enough to be used in all databases, select Default in the database list.

  4. Enter the database version in the Version field.

  5. Define the SQL statement for the indicator you want to create:

    • Click the Edit... button next to the SQL Template field.

      The [Edit Expression] dialog box opens.

    • In the Indicator Definition view, enter the SQL expression(s) you want to use in matching and analyzing data. You can drop templates from the templates list to complete the expression.

      For example, set the expression to measure the maximal length of the values in a column as shown in the above capture.

      This view may have several input fields, one for each column expected by indicator category. For example, if you select the User Defined Count category, you will have only a Where Expression field; while if you select the User Defined Match category, you will have two fields: Matching Expression and Where Expression.

      The SQL expressions are automatically transformed into a complete SQL template in the Full SQL Template view.

      Also, the SQL expressions are automatically transformed into templates to view rows/values. Different tabs are available in the dialog box depending on what indicator category is selected.

      If you edit the SQL expression(s) in the Indicator Definition view, the templates will be updated accordingly in the other tabs.

    • Use the Reset button to revert all templates according to the content of the Indicator Definition tab.

    • Click OK.

      The dialog box is closed and the SQL template is displayed in the indicator editor.

    • Use the [+] button and follow the same steps to add as many indicator definitions as needed.

    Note

    You do not need to define any parameters in the Indicator Parameters view when the user-defined indicator contains only SQL templates. These parameters are used only when indicators have Java implementation. For further information, see How to define Java user-defined indicators.

  6. Click the save icon on top of the editor.

    The indicator is listed under the User Defined Indicators folder in the DQ Repository tree view. You can use this indicator to analyzed columns through a simple drag-and-drop operation from the DQ Repository tree view to the columns listed in the editor.

    If an analysis with a user-defined indicator runs successfully at least one time and later the indicator definition template for the database is deleted, the analysis does not fail. It keeps running successfully because it uses the previously generated SQL query.

How to define Java user-defined indicators

You can create your own personalized Java indicators from the Profiling perspective of the studio. Management processes for Java user-defined indicators are the same as those for system indicators.

Note

You can also import a ready-to-use Java user-defined indicator from the Exchange folder in the DQ Repository tree view. This Java user-defined indicator connects to the mail server and checks if the email exists. For further information on importing indicators from Talend Exchange, see How to import user-defined indicators from Talend Exchange.

The two sections below detail the procedures to create Java user-defined indicators.

How to create Java user-defined indicators

Prerequisite(s): You have selected the Profiling perspective in the studio.

Defining the indicator

  1. In the DQ Repository tree view, expand Libraries > Indicators.

  2. Right-click User Defined Indicators.

  3. Select New Indicator from the contextual menu.

    The [New Indicator] wizard is displayed.

  4. In the Name field, enter a name for the Java indicator you want to create.

    Note

    Avoid using special characters in the item names including:

    "~", "!", "`", "#", "^", "&", "*", "\\", "/", "?", ":", ";", "\"", ".", "(", ")", "'", "¥", "'", """, "«", "»", "<", ">".

    These characters are all replaced with "_" in the file system and you may end up creating duplicate items.

  5. If required, set other metadata (purpose, description and author name) in the corresponding fields and click Finish.

    The indicator editor opens displaying the metadata of the user-defined indicator.

Setting the indicator definition and category

  1. Click Indicator Category and select from the list a category for the Java indicator.

    The selected category determines the columns expected in the result set of the analysis that uses this indicator.

    The table below explains available categories.

    Indicator category

    Description

    Expected query results

    User Defined Match

    Evaluates the number of data matching a condition.

    The result set should have one row and two columns. The first column contains the number of values that match and the second column the total count.

    User Defined Frequency

    Evaluates the frequency of records using user-defined indicators for each distinct record.

    The result set should have 0 or more rows and two columns. The first column contains a value and the second the frequency (count) of this value.

    User Defined Real Value

    Evaluates real functions of data.

    The result set should have one row and one column that contain a real value.

    User Defined Count (by-default category)

    Analyzes the quantity of records and returns a row count.

    The result set should have one row and one column that contain the row count.
  2. Click Indicator Definition and then click the [+] button.

  3. From the Database list, select Java.

  4. Enter the Java class in the Java Class field.

    Note

    Make sure that the class name includes the package path, if this string parameter is not correctly specified, an error message will display when you try to save the Java user-defined indicator.

  5. Select the Java archive holding the Java class:

    • Click the Edit... button.

      The [UDI Selector] dialog box opens.

    • In the Select libraries view, select the check box of the archive holding the Java class and then select the class in the bottom panel of the wizard.

    • Click OK.

      The dialog box is closed and the Java archive is displayed in the indicator editor.

      You can add or delete Java archives from the Manage Libraries view of this dialog box.

      For more information on creating a Java archive, see How to create a Java archive for the user-defined indicator.

  6. Click Indicator Parameters to open the view where you can define parameters to retrieve parameter values while coding the Java indicator.

    You can retrieve parameter values with a code similar to this one that retrieves the parameter of EMAIL_PARAM:

    // Check prerequisite
            IndicatorParameters param = this.getParameters();
            if (param == null) {
                log.error("No parameter set in the user defined indicator " + this.getName()); //$NON-NLS-1$
                return false;
            }
            Domain indicatorValidDomain = param.getIndicatorValidDomain();
            if (indicatorValidDomain == null) {
                log.error("No parameter set in the user defined indicator " + this.getName()); //$NON-NLS-1$
                return false;
            }
    
            // else retrieve email from parameter
            EList<JavaUDIIndicatorParameter> javaUDIIndicatorParameter = indicatorValidDomain.getJavaUDIIndicatorParameter();
            for (JavaUDIIndicatorParameter p : javaUDIIndicatorParameter) {
                if (EMAIL_PARAM.equalsIgnoreCase(p.getKey())) {

    For a more detailed sample of the use of parameters in a Java user-defined indicator, check https://github.com/Talend/tdq-studio-se/tree/master/sample/test.myudi/src/main/java/org/talend/dataquality/indicator/userdefine/email.

  7. Click the [+] button at the bottom of the table and define in the new line the parameter key and value.

    You can edit these default parameters or even add new parameters any time you use the indicator in a column analysis. To do this, click the indicator option icon in the analysis editor to open a dialog box where you can edit the default parameters according to your needs or add new parameters.

  8. Click the save icon on top of the editor.

    The indicator is listed under the User Defined Indicators folder in the DQ Repository tree view. You can use this indicator to analyzed columns through a simple drag-and-drop operation from the DQ Repository tree view to the columns listed in the editor.

How to create a Java archive for the user-defined indicator

Before creating a Java archive for the user defined indicator, you must define, in Eclipse, the target platform against which the workspace plug-ins will be compiled and tested.

To define the target platform, do the following:

  1. In Eclipse, select Preferences to display the [Preferences] window.

  2. Expand Plug-in Development and select Target Platform then click Add... to open a view where you can create the target definition.

  3. Select the Nothing: Start with an empty target definition option and then click Next to proceed to the next step.

  4. In the Name field, enter a name for the new target definition and then click the Add... button to proceed to the next step.

  5. Select Installation from the Add Content list and then click Next to proceed to the next step.

  6. Use the Browse... button to set the path of the installation directory and then click Next to proceed to the next step.

    The new target definition is displayed in the location list.

  7. Click Finish to close the dialog box.

To create a Java archive for the user defined indicator, do the following:

  1. In Eclipse, check out the project from GIT at https://github.com/Talend/tdq-studio-se/tree/master/sample/test.myudi.

    In this Java project, you can find four Java classes that correspond to the four indicator categories listed in the Indicator Category view in the indicator editor.

    Each one of these Java classes extends the UserDefIndicatorImpl indicator. The figure below illustrates an example using the MyAvgLength Java class.

    package test.udi;
    
    import org.talend.dataquality.indicators.sql.impl.UserDefIndicatorImpl;
    
    /**
     * @author mzhao
     * 
     * A very simple example of a java implementation of a user defined indicator. This indicator returns a user defined
     * real value. It implements the minimum number of required methods.
     */
    public class MyAvgLength extends UserDefIndicatorImpl {
    
        private double length = 0;
    
        @Override
        public boolean reset() {
            super.reset();
            length = 0;
            return true;
        }
    
        @Override
        public boolean handle(Object data) {
            super.handle(data);
            // an indicator which computes the average text length on data which are more than 2 characters (this means that
            // text values with less than 2 characters are not taken into account).
            int dataLength = (data != null) ? data.toString().length() : 0;
            if (dataLength > 2) {
                length += dataLength;
            }
            return true;
        }
    
        /*
         * (non-Javadoc)
         * 
         * @see org.talend.dataquality.indicators.impl.IndicatorImpl#finalizeComputation()
         */
        @Override
        public boolean finalizeComputation() {
            value = String.valueOf(this.length / (this.getCount() - this.getNullCount()));
            return super.finalizeComputation();
        }
    
    }
  2. Modify the code of the methods that follow each @Override according to your needs.

  3. If required, use the following methods in your code to retrieve the indicator parameters:

  4. use Indicator.getParameter() which returns an IndicatorParameters object.

  5. call IndicatorParameters.getIndicatorValidDomain() which returns a Domain object.

  6. call Domain.getJavaUDIIndicatorParameter() which returns a list of JavaUDIIndicatorParameter that stores each key/value pair that defines the parameter.

  7. Save your modifications.

  8. Using Eclipse, export this new Java archive.

The Java archive is now ready to be attached to any Java indicator you want to create in from Profiling perspective of the studio.

How to export user-defined indicators

You can export user-defined indicators to archive files or to Talend Exchange to be shared with other users.

How to export user-defined indicators to an archive file

You can export user-defined indicators and store them locally in an archive file using the Export Item option on the studio toolbar. For further information on how to export indicators, see Exporting data profiling items.

How to export user-defined indicators to Talend Exchange

You can export user-defined indicators from your current studio to Talend Exchange where you can share them with other users.

The exported indicators are saved as .xmi files on the exchange server.

Prerequisite(s):At least one user-defined indicator is created in the Profiling perspective of the studio.

To export user-defined indicators to Talend Exchange, do the following:

  1. In the DQ Repository tree view, expand Libraries > Indicators.

  2. Right-click the User Defined Indicator folder and select Export for Talend Exchange.

    The [Export for Talend Exchange] wizard is displayed.