These properties are used to configure tPatternMasking running in the Standard Job framework.
The Standard tPatternMasking component belongs to the Data Quality family.
Basic settings
Schema and Edit Schema |
A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields. Click Sync columns to retrieve the schema from the previous component connected in the Job. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:
The output schema of this component contains one read-only
column, ORIGINAL_MARK. This column identifies by |
|
Built-In: You create and store the schema locally for this component only. |
|
Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. |
Modifications |
Define in the table what fields to change and how to change them: Column to mask: Select the column from the input flow for which you want to generate similar data by modifying its values. You can mask data from different columns but you need to follow the order of the fields you want to mask. Each column is processed sequentially, meaning that data masking operations will be performed on the data from the first column, the second column, and so on.
Field type: Select the field type the data belongs to.
In the Values, Path, Range and Date Range, values must be enclosed in double quotes. When the input data is invalid, meaning that a value is not included in
the defined range, date range or in the enumeration, the generated value
is |
Advanced settings
Seed for random generator |
Set a random number if you want to generate the same sample of substitute data in each execution of the Job. This field is set to 12345678 by default. Repeating the execution with a different value for this field will result in a different sample being generated. Keep this field empty if you want to generate a different sample each time you execute the Job. |
Output the original row? |
Select this check box to output original data rows in addition to the substitute data. Having both data rows can be useful in debug or test processes. |
Should Null input return NULL? |
This check box is selected by default. When selected, the component outputs null when input values are null. Otherwise, it returns the default value when the input is null, that is an empty string for string values, 0 for numeric values and the current date for date values. This parameter does not have an effect on the Generate Sequence function. If the input is null, this function will not return null, even if the box is checked. |
Should EMPTY input return EMPTY? |
When this check box is selected, the component returns the input values if they are empty. Otherwise, the selected functions are applied to the input data. |
tStat Catcher Statistics |
Select this check box to gather the Job processing metadata at the Job level as well as at each component level. |
Usage
Usage rule |
This component is an intermediary step. It requires an input and output flows. |