Ridge Regression in r
Decision tree classification in R
Decision trees are one of the most basic and widely used machine learning
algorithms, which fall under supervised machine learning techniques. Decision
trees can handle both regression and classification tasks, and therefore
learning decision trees is a must for those who aspire to be data scientists.
In this article, we will learn about decision trees, how to
work with decision trees, and how to implement decision trees in R. We will
also discuss the applications of decision trees along with their advantages and
disadvantages
What is a decision tree and how does it work?
Decision trees are a non-parametric form of supervised
machine learning algorithm used for both classification and regression. As the
name suggests, decision trees work by asking a Boolean form of question and,
based on the answer, make a decision that goes further in the form of a tree,
thus the name decision tree. The model further asks the questions until the
prediction is made. The goal of decision trees is to form a model that can predict
the value of a target variable by learning through simple questions inferred
from the data features.
The decision trees are used to predict the class labels, and
prediction starts from the root node of the decision tree. It is
straightforward to calculate which attribute must represent the root node and
can be done by figuring out the attributes which best separate the training
records. This calculation can be done by using the Gini impurity formula, which
is a simple formula to distinguish the attributes. However, it becomes complex
with a greater number of attributes. Once the root node is determined, the tree
forms branches through the questions and decisions made by the tree. This
process of branching continues until all the impurities found in the root node
are classified.
Decision trees work in the format of a multiple if-else
statement analogy where the trees grow through Boolean questions and stop only
when the prediction is made.
Various types of Decision tree algorithms in R
The decision tree algorithm in R is of various types, which
differ in the way they function and form the tree while making the prediction.
The four different forms of decision tree algorithms are ID3, C4.5, C5.0, and
CART.
ID3
ID3, also known as Iterative Dichotomiser 3, which was
developed by Ross Quinlan in 1986, is the first form of the decision tree that
creates a multiway tree and works in a greedy manner. Each node of this
multiway tree is found by getting the categorical feature that yields the
largest information gain for categorical targets. The trees in ID3 are grown to
their maximum size. Pruning is applied to improve the ability of trees and to
generalization of the unseen data.
C4.5
C4.5 is a successor to the ID3 model, which has removed the
restriction that features need to be categorical. This is done by the dynamic
definition of a discrete attribute partitioning the continuous attribute value
into intervals of discrete sets. The c4.5 model converts the trees into a set
of if-then rules. The accuracy of each if-then rule is evaluated, which
determines the order in which they are applied. Pruning is performed by
removing the precondition of the rule and checking if the accuracy of this rule
improves without it.
C5.0
C5.0 is a modification to C4.5 and uses comparatively less
memory to build smaller rulesets which are more accurate than C4.5.
CART
The logic of CART (Classification and Regression trees) is
similar to C4.5 and differs in that it does not perform the computation of rule
sets and supports numerical target variables. CART uses features and thresholds
to construct the binary trees and yields the largest information gain at every
node.
The depth of the decision tree is an important factor as the
depth of the tree matters a lot in building a tree. This is done through the
concepts of entropy and information gain.
Entropy
Entropy measures the uncertainty or impurity within the
dataset and determines how a dataset is split through the decision tree.
Entropy is also defined as the measure of disorder within the dataset, and the
mathematical formula of entropy is:
Entropy is sometimes also denoted through ‘H’.
In the equation above, ‘pi’ is the probability of a class
‘i’ in the dataset.
‘i’ defined in the above equation can either be positive or
negative depending on the class it belongs to.
Entropy is measured as a value between 0 and 1, where a
value closer to 0 determines that the dataset is pure and contains fewer
impurities. In contrast, a value closer to 1 means that the dataset is impure
and has a high level of disorder. Sometimes the value of entropy can be greater
than 1, which means that it has a very high level of disorder, and therefore,
for the sake of simplicity, we can remember it as entropy between 0 and 1.
Entropy is calculated to measure the uncertainty or disorder
present in the dataset, and the goal of finding entropy is to remove these
disorders and reduce the amount of uncertainty present in the dataset.
Information Gain
If the uncertainty present in the dataset is measured, then
we need to measure the reduction of uncertainty in the target class of the
dataset with the features available to us.
Information gain is used for measuring the amount of
information a feature provides about a class and helps in determining the order
of attributes.
Information gain is also known as Kullback-Leibler
divergence and is denoted as IG(S, A), where S denotes the set and is the
effective change in entropy after the decision has been made regarding a
particular attribute denoted by A. Hence, information gain measures the
relative change occurring in entropy as per the independent variable. The
equation for information gain is given by:
https://www.dataspoof.info/post/decision-tree-classification-in-r/
Comments
Post a Comment