Merging two datasets in HDFS - Cloud - 8.0

Sqoop

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Database tools > Sqoop components
Data Governance > Third-party systems > Data management components > Data movement > Sqoop components
Data Quality and Preparation > Third-party systems > Database tools > Sqoop components
Data Quality and Preparation > Third-party systems > Data management components > Data movement > Sqoop components
Design and Development > Third-party systems > Database tools > Sqoop components
Design and Development > Third-party systems > Data management components > Data movement > Sqoop components
Last publication date
2024-02-20

This scenario applies only to Talend products with Big Data.

For more technologies supported by Talend, see Talend components.

This scenario illustrates how to use tSqoopMerge to merge two datasets that are sequentially imported to HDFS from the same MySQL table, with modifications of a record in between.

The first dataset (the old one before the modifications) to be used in this scenario reads as follows:
id,wage,mod_date
0,2000,2008-06-26 04:25:59
1,2300,2011-06-12 05:29:45
2,2500,2007-01-15 11:59:13
3,3000,2010-05-02 15:34:05
			

The path to it in HDFS is /user/ychen/target_old.

The second dataset (the new one after the modifications) to be used reads as follows:
id,wage,mod_date
0,2000,2008-06-26 04:25:59
1,2300,2011-06-12 05:29:45
2,2500,2007-01-15 11:59:13
3,4000,2013-10-14 18:00:00
			

The path to it in HDFS is /user/ychen/target_new.

These datasets were both imported by tSqoopImport. For a scenario about how to use tSqoopImport, see Importing a MySQL table to HDFS.

The Job in this scenario merges these two datasets with the newer record overwriting the older one.

Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the Hadoop distribution to be used. Then proceed as follows: