The list of suspect duplicate pairs can be very large. You label only a subset of this list to identify the potential groups of duplicates.
You can then use machine learning to predict labels for the whole list. Then, it is possible to output a sample of this list, with a size fixed manually. The sample is chosen randomly.
For an example of how to label suspect pairs in a Grouping campaign created in Talend Data Stewardship, see Handling grouping tasks to decide on relationship among pairs of records.