Machine Learning 101 - Decision Trees - Cloud - 8.0

Machine Learning

Version
Cloud
8.0
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
Last publication date
2024-02-20
This article explains how to develop machine learning and decision trees.

Overview

This hands on tutorial demonstrates the basics of developing a machine learning routine using Talend and Spark. Specifically, decision tree learning will be leveraged for classification of real-life bank marketing data. Upon completion, you will have a working knowledge of how machine learning is integrated into a Talend workflow and some re-usable code snippets.

The source data used in this tutorial was retrieved from the UCI Machine Learning Repository. Irvine, CA: University of California, Schools of Information and Computer Science. It is available in the public domain and is attributed to: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems, Elsevier, 62:22-31, June 2014: Bank Marketing dataset.

Prerequisites

You have:
  • Hortonworks 2.4 (HDP) installed and configured. You can also use Hortonworks sandbox, a virtual machine (VM) that you can download. For more information, see Create HDFS Metadata - Hortonworks.
  • Basic knowledge of:
    • Hadoop ecosystem's tools and technologies.
    • Hadoop Distributed File System (HDFS) and Spark.
  • Working knowledge of Talend Studio and Talend Big Data Platform.
  • Talend Big Data Platform installed and configured.