Big Data Platform
Cloud API Services Platform
Cloud Big Data Platform
Cloud Data Fabric
Cloud Data Management Platform
Data Management Platform
Data Services Platform
Real-Time Big Data Platform
About this task
Defining a blocking key is not mandatory but strongly advisable. Using a blocking key to partition data in blocks reduces the number of records that need to be examined as comparisons are restricted to record pairs within each block. Using blocking column(s) is very useful when you are processing a big dataset.
- In the Data section, click the Select Blocking Key tab.
Click the name of the column(s) you want to use to partition the processed data in
Blocking keys that have the exact name of the selected columns are listed in the Blocking Key table.You can define more than one column in the table, but only one blocking key will be generated and listed in the BLOCK_KEY column in the Data table.For example, if you use an algorithm on the country and lnamecolumns to process records that have the same first character, data records that have the same first letter in the country and last names are grouped together in the same block. Comparison is restricted to record within each block.To remove a column from the Blocking key table, right-click it and select Delete or click on its name in the Data table.
Select an algorithm for the blocking key, and set the other parameters in
the Blocking Key table as needed.
In this example, only one blocking key is used. The first character of each word in the country column is retrieved and listed in the BLOCK_KEY column.
Click Chart to compute the generated key,
group the sample records in the Data table
and display the results in a chart.
This chart allows you to visualize the statistics regarding the number of blocks and to adapt the blocking parameters according to the results you want to get.