Procedure

Deduplication

author
Talend Documentation Team
EnrichVersion
6.5
EnrichProdName
Talend Data Services Platform
Talend ESB
Talend Open Studio for Big Data
Talend Big Data
Talend Open Studio for ESB
Talend Big Data Platform
Talend Real-Time Big Data Platform
Talend Open Studio for Data Integration
Talend Open Studio for MDM
Talend Data Management Platform
Talend Data Integration
Talend MDM Platform
Talend Data Fabric
task
Data Quality and Preparation > Third-party systems > Data Quality components > Deduplication components
Design and Development > Third-party systems > Data Quality components > Deduplication components
Data Governance > Third-party systems > Data Quality components > Deduplication components
EnrichPlatform
Talend Studio

Procedure

  1. In the Repository tree view of the Integration perspective of Talend Studio , right-click the Job you have created in the earlier scenario to open its contextual menu and select Edit properties.
    Then the [Edit properties] dialog box is displayed. The Job must be closed before you are able to make any changes in this dialog box.
    Note that you can change the Job name as well as the other descriptive information about the Job from this dialog box.
  2. From the Job Type list, select Big Data Batch.
  3. From the Framework list, select Spark. Then a Spark Job using the same name appears under the Big Data Batch sub-node of the Job Design node.

Results

If you need to create this Spark Job from scratch, you have to right-click the Job Design node or the Big Data Batch sub-node and select Create Big Data Batch Job from the contextual menu. Then an empty Job is opened in the workspace.