Performing download analysis using a Spark Batch Job - Cloud - 8.0

Processing (Integration)

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Data Integration
Talend Data Management Platform
Talend Data Services Platform
Talend ESB
Talend MDM Platform
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Processing components (Integration)
Data Quality and Preparation > Third-party systems > Processing components (Integration)
Design and Development > Third-party systems > Processing components (Integration)
Last publication date
2024-03-05

This scenario applies only to subscription-based Talend products with Big Data.

For more technologies supported by Talend, see Talend components.

In this scenario, you create a Spark Batch Job to analyze how often a given product is downloaded.

In this Job, you analyze the download preference of some specific customers known to your customer base.

The sample data used as the customer base is as follows:
10103|Herbert|Clinton|FR|SILVER|28-06-2011|herbert.clinton@msn.com|6571183
10281|Bill|Ford|BE|PLATINUM|13-04-2014|bill.ford@gmail.com|6360604
10390|George|Garfield|GB|SILVER|12-02-2011|george.garfield@gmail.com|7919508
10566|Abraham|Garfield|CN|SILVER|11-10-2012|abraham.garfield@msn.com|9155569
10691|John|Polk|GB|SILVER|05-11-2012|john.polk@gmail.com|6488579
10884|Herbert|Hayes|GB|SILVER|12-10-2007|herbert.hayes@gmail.com|8728181
11020|Chester|Roosevelt|BE|GOLD|28-06-2008|chester.roosevelt@yahoo.com|4172181
11316|Franklin|Madison|BR|SILVER|08-01-2014|franklin.madison@gmail.com|4711801
11707|James|Tyler|ES|GOLD|25-03-2010|james.tyler@gmail.com|7276942
11764|Theodore|McKinley|GB|GOLD|24-08-2013|theodore.mckinley@gmail.com|3224767
11777|Warren|Madison|BE|N/A|23-12-2008|warren.madison@msn.com|6695520
11857|Ronald|Arthur|SG|PLATINUM|01-04-2009|ronald.arthur@msn.fr|6704785
11936|Theodore|Buchanan|NL|SILVER|14-11-2014|theodore.buchanan@yahoo.fr|2783553
11940|Lyndon|Wilson|BR|PLATINUM|27-07-2010|lyndon.wilson@yahoo.com|1247110
12214|Gerald|Jefferson|SG|N/A|06-06-2007|gerald.jefferson@yahoo.com|5879162
12382|Herbert|Taylor|IT|GOLD|22-04-2012|herbert.taylor@msn.com|3873628
12475|Richard|Kennedy|FR|N/A|29-12-2014|richard.kennedy@yahoo.fr|7287388
12479|Calvin|Eisenhower|ES|N/A|06-11-2008|calvin.eisenhower@yahoo.fr|1792573
12531|Chester|Arthur|JP|PLATINUM|23-01-2009|chester.arthur@msn.fr|8772326
12734|Jimmy|Buchanan|IT|SILVER|09-03-2010|jimmy.buchanan@gmail.com|7007786

This data contains these customers' ID numbers known to this customer base, their first and last names and country codes, their support levels and registration dates, their email addresses and phone numbers.

The sample web-click log of some of these customers reads as follows:
10103|/download/products/talend-open-studio
10281|/services/technical-support
10390|/services/technical-support
10566|/download/products/data-integration
10691|/services/training
10884|/download/products/integration-cloud
11020|/services/training
11316|/download/products/talend-open-studio
11707|/download/products/talend-open-studio
11764|/customers

This data contains the ID numbers of the customers who visited different Talend web pages and the pages they visited.

By reading this data, you can find that the visits come from customers of different support-levels for different purposes. The Job to be designed is used to identify the sources of these visits against the sample customer base and analyze which product is most downloaded by the Silver-level customers.

Note that the sample data is created for demonstration purposes only.

To replicate this scenario, proceed as follows: