tDataQualityRules properties for Apache Spark Batch - Cloud - 8.0

Validation (Integration)

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Data Quality components > Validation components (Integration)
Data Quality and Preparation > Third-party systems > Data Quality components > Validation components (Integration)
Design and Development > Third-party systems > Data Quality components > Validation components (Integration)
Last publication date
2024-02-20

These properties are used to configure tDataQualityRules running in the Spark Batch Job framework.

The Spark Batch tDataQualityRules component belongs to the Data Quality family.

Basic Settings

Output schema and Edit schema

A schema is a row description that defines the number of fields (columns) to be processed and passed on to the next component.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

Select the Schema type:
  • Built-In: You create and store the schema locally for this component only.

  • Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

View schema: choose this option to view the schema only.

Change to built-in property: choose this option to change the schema to Built-in for local changes.

Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion.

The supported types are: Boolean, Date, Double, Float, Integer, Long, Short, and String.

Application Select the application from which you want to retrieve the data quality rules.

This option is available from Talend Studio 8.0 R2024-01 onwards.

URL
Important: You need the Rules - View permission to retrieve the rules. For more information, see the predefined user roles for the app you are using.
Enter the URL of the app selected from the Application drop-down list. When the URL does not match the selected app, the Job may fail. The following URLs are supported:
  • Talend Cloud Data Stewardship, or the hybrid version of Talend Data Stewardship 8.0 R2022-07 and greater:
    https://tds.<env>.cloud.talend.com/rulerepository/api/v1
    https://tds.<env>.cloud.talend.com/rulerepository/api/v1/
    https://tds.<env>.cloud.talend.com/rulerepository
    https://tds.<env>.cloud.talend.com/rulerepository/
    https://tds.<env>.cloud.talend.com (Only for Talend Cloud Data Stewardship)
    https://tds.<env>.cloud.talend.com/ (Only for Talend Cloud Data Stewardship)

    When you use the hybrid version, you can use a URL with the IP address or the hostname:

    https://ip:19999/rulerepository/api/v1
    https://ip:19999/rulerepository/api/v1/
    https://ip:19999/rulerepository
    https://ip:19999/rulerepository/
    https://hostname:19999/rulerepository/api/v1
    https://hostname:19999/rulerepository/api/v1/         
    https://hostname:19999/rulerepository
    https://hostname:19999/rulerepository/
  • Talend Cloud Data Inventory, from Talend Studio 8.0 R2023-06:
    https://tdc.<env>.cloud.talend.com/rulerepository/api/v1
    https://tdc.<env>.cloud.talend.com/rulerepository/api/v1/
    https://tdc.<env>.cloud.talend.com/rulerepository
    https://tdc.<env>.cloud.talend.com/rulerepository/
    https://tdc.<env>.cloud.talend.com
    https://tdc.<env>.cloud.talend.com/ 
where <env> part is the name of your Cloud region. See Talend Cloud regions and URLs.
Token Enter your personal access token. To generate one, see https://help.talend.com/r/en-US/Cloud/management-console-user-guide/cloud-access-token.
DQ rule library timestamp After you entered the URL and token, click Refresh.

If the URL and token are correct, the data quality rules are retrieved into Talend Studio: yyyy-MM-dd hh:mm:ss (library_number).

When you update the data quality rules in the Cloud or hybrid application, click Refresh to retrieve the latest version.

Configure DQ rules Associate the variables of the rule with the input data.
The rules are retrieved from the library:
  • DQ Rule: Select the rule.
  • Rule variable: The variables of the rule are automatically retrieved.
  • Input column: Select the column that contains the values that must replace the variable.
If no rules or input columns are available, verify that:
  • Data quality rules have been retrieved in DQ rule library timestamp.
  • The input schema is correct.

You can associate the data types from Talend Studio with some data types from Talend Cloud Data Stewardship or hybrid Talend Data Stewardship. See Associating data types below.

To apply more rules, click .

Associating data types

The following table describes the data types you can associate.
Rule variable from the app* Input column (from Talend Studio)
Number Double, Float, Integer, Long, Short, and String
Boolean Boolean
Text String
Date Date
* You can enter the URL of:
  • Talend Cloud Data Stewardship.
  • The hybrid version of Talend Data Stewardship 8.0 R2022-07 and greater.
  • Talend Cloud Data Inventory, from Talend Studio 8.0 R2023-06.

Advanced Settings

tStatCatcher statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.