What is lasso regression
Decision tree classification in R
Decision trees are one of the most basic and widely used machine learning
algorithms, which fall under supervised machine learning techniques. Decision
trees can handle both regression and classification tasks, and therefore
learning decision trees is a must for those who aspire to be data scientists.
In this article, we will learn about
decision trees, how to work with decision trees, and how to implement decision
trees in R. We will also discuss the applications of decision trees along with
their advantages and disadvantages
What is a decision tree and how does it
work?
Decision trees are a non-parametric form of
supervised machine learning algorithm used for both classification and
regression. As the name suggests, decision trees work by asking a Boolean form
of question and, based on the answer, make a decision that goes further in the
form of a tree, thus the name decision tree. The model further asks the
questions until the prediction is made. The goal of decision trees is to form a
model that can predict the value of a target variable by learning through
simple questions inferred from the data features.
The decision trees are used to predict the
class labels, and prediction starts from the root node of the decision tree. It
is straightforward to calculate which attribute must represent the root node
and can be done by figuring out the attributes which best separate the training
records. This calculation can be done by using the Gini impurity formula, which
is a simple formula to distinguish the attributes. However, it becomes complex
with a greater number of attributes. Once the root node is determined, the tree
forms branches through the questions and decisions made by the tree. This
process of branching continues until all the impurities found in the root node
are classified.
Decision trees work in the format of a
multiple if-else statement analogy where the trees grow through Boolean
questions and stop only when the prediction is made.
Various types of Decision tree algorithms
in R
The decision tree algorithm in R is of
various types, which differ in the way they function and form the tree while
making the prediction. The four different forms of decision tree algorithms are
ID3, C4.5, C5.0, and CART.
ID3
ID3, also known as Iterative Dichotomiser
3, which was developed by Ross Quinlan in 1986, is the first form of the
decision tree that creates a multiway tree and works in a greedy manner. Each
node of this multiway tree is found by getting the categorical feature that
yields the largest information gain for categorical targets. The trees in ID3
are grown to their maximum size. Pruning is applied to improve the ability of
trees and to generalization of the unseen data.
C4.5
C4.5 is a successor to the ID3 model, which
has removed the restriction that features need to be categorical. This is done
by the dynamic definition of a discrete attribute partitioning the continuous
attribute value into intervals of discrete sets. The c4.5 model converts the
trees into a set of if-then rules. The accuracy of each if-then rule is
evaluated, which determines the order in which they are applied. Pruning is
performed by removing the precondition of the rule and checking if the accuracy
of this rule improves without it.
C5.0
C5.0 is a modification to C4.5 and uses
comparatively less memory to build smaller rulesets which are more accurate
than C4.5.
CART
The logic of CART (Classification and
Regression trees) is similar to C4.5 and differs in that it does not perform
the computation of rule sets and supports numerical target variables. CART uses
features and thresholds to construct the binary trees and yields the largest
information gain at every node.
The depth of the decision tree is an
important factor as the depth of the tree matters a lot in building a tree.
This is done through the concepts of entropy and information gain.
Entropy
Entropy measures the uncertainty or
impurity within the dataset and determines how a dataset is split through the
decision tree. Entropy is also defined as the measure of disorder within the
dataset, and the mathematical formula of entropy is:
Entropy is sometimes also denoted through
‘H’.
In the equation above, ‘pi’ is the
probability of a class ‘i’ in the dataset.
‘i’ defined in the above equation can
either be positive or negative depending on the class it belongs to.
Entropy is measured as a value between 0
and 1, where a value closer to 0 determines that the dataset is pure and
contains fewer impurities. In contrast, a value closer to 1 means that the
dataset is impure and has a high level of disorder. Sometimes the value of
entropy can be greater than 1, which means that it has a very high level of
disorder, and therefore, for the sake of simplicity, we can remember it as
entropy between 0 and 1.
Entropy is calculated to measure the
uncertainty or disorder present in the dataset, and the goal of finding entropy
is to remove these disorders and reduce the amount of uncertainty present in
the dataset.
Information Gain
If the uncertainty present in the dataset
is measured, then we need to measure the reduction of uncertainty in the target
class of the dataset with the features available to us.
Information gain is used for measuring the
amount of information a feature provides about a class and helps in determining
the order of attributes.
Information gain is also known as
Kullback-Leibler divergence and is denoted as IG(S, A), where S denotes the set
and is the effective change in entropy after the decision has been made
regarding a particular attribute denoted by A. Hence, information gain measures
the relative change occurring in entropy as per the independent variable. The
equation for information gain is given by:
https://www.dataspoof.info/post/decision-tree-classification-in-r/
Comments
Post a Comment