Filtering Avro format employee data - 7.3

Avro

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > File components (Integration) > Avro components
Data Quality and Preparation > Third-party systems > File components (Integration) > Avro components
Design and Development > Third-party systems > File components (Integration) > Avro components
Last publication date
2024-02-21

This scenario applies only to Talend products with Big Data.

For more technologies supported by Talend, see Talend components.

This scenario illustrates how to create a Talend Map/Reduce Job to read, transform and write Avro format data by using Map/Reduce components. This Job generates Map/Reduce code and directly runs in Hadoop. In addition, the Map bar in the workspace indicates that only a mapper will be used in this Job and at runtime, it shows the progress of the Map computation.

Note that the Talend Map/Reduce components are available to subscription-based Big Data users only and this scenario can be replicated only with Map/Reduce components.

The sample data to be used in this scenario is employee information of a company with records virtually reading as follows but actually only visible as Avro format files:
1;Lyndon;Fillmore;21-05-2008
2;Ronald;McKinley;15-08-2008
3;Ulysses;Roosevelt;05-10-2008
4;Harry;Harrison;23-11-2007
5;Lyndon;Garfield;19-07-2007
6;James;Quincy;15-07-2008
7;Chester;Jackson;26-02-2008
8;Dwight;McKinley;16-07-2008
9;Jimmy;Johnson;23-12-2007
10;Herbert;Fillmore;03-04-2008
				 

Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the Hadoop distribution to be used. Then proceed as follows: