Machine Learning 101 - Decision Trees - 7.3

Machine Learning

Version
7.3
Language
English
Product
Talend Big Data
Talend Big Data Platform
Talend Data Fabric
Talend Real-Time Big Data Platform
Module
Talend Studio
Content
Data Governance > Third-party systems > Machine Learning components
Data Quality and Preparation > Third-party systems > Machine Learning components
Design and Development > Third-party systems > Machine Learning components
This article explains how to develop machine learning and decision trees.

Overview

This hands on tutorial demonstrates the basics of developing a machine learning routine using Talend and Spark. Specifically, decision tree learning will be leveraged for classification of real-life bank marketing data. Upon completion, you will have a working knowledge of how machine learning is integrated into a Talend workflow and some re-usable code snippets.

The source data used in this tutorial was retrieved from the UCI Machine Learning Repository. Irvine, CA: University of California, Schools of Information and Computer Science. It is available in the public domain and is attributed to: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems, Elsevier, 62:22-31, June 2014: Bank Marketing Data Set

Prerequisites

  • You have Hortonworks 2.4 (HDP) installed and configured. You can also use Hortonworks sandbox, a downloadable virtual machine (VM). For more information, see Create HDFS Metadata - Hortonworks.
  • You have basic knowledge of Hadoop ecosystem's tools and technologies.
  • You have basic knowledge of Hadoop Distributed File System (HDFS) and Spark.
  • You have working knowledge of Talend Studio and Talend Big Data Platform.
  • You have Talend Big Data Platform installed and configured. Any license model above this platform also work.