Advantages and disadvantages of using the Ridge Regression

 Decision tree classification in R


Decision trees are one of the most basic and widely used machine learning algorithms, which fall under supervised machine learning techniques. Decision trees can handle both regression and classification tasks, and therefore learning decision trees is a must for those who aspire to be data scientists.

In this article, we will learn about decision trees, how to work with decision trees, and how to implement decision trees in R. We will also discuss the applications of decision trees along with their advantages and disadvantages

What is a decision tree and how does it work?

Decision trees are a non-parametric form of supervised machine learning algorithm used for both classification and regression. As the name suggests, decision trees work by asking a Boolean form of question and, based on the answer, make a decision that goes further in the form of a tree, thus the name decision tree. The model further asks the questions until the prediction is made. The goal of decision trees is to form a model that can predict the value of a target variable by learning through simple questions inferred from the data features.

The decision trees are used to predict the class labels, and prediction starts from the root node of the decision tree. It is straightforward to calculate which attribute must represent the root node and can be done by figuring out the attributes which best separate the training records. This calculation can be done by using the Gini impurity formula, which is a simple formula to distinguish the attributes. However, it becomes complex with a greater number of attributes. Once the root node is determined, the tree forms branches through the questions and decisions made by the tree. This process of branching continues until all the impurities found in the root node are classified.

Decision trees work in the format of a multiple if-else statement analogy where the trees grow through Boolean questions and stop only when the prediction is made.

Various types of Decision tree algorithms in R

The decision tree algorithm in R is of various types, which differ in the way they function and form the tree while making the prediction. The four different forms of decision tree algorithms are ID3, C4.5, C5.0, and CART.

ID3

ID3, also known as Iterative Dichotomiser 3, which was developed by Ross Quinlan in 1986, is the first form of the decision tree that creates a multiway tree and works in a greedy manner. Each node of this multiway tree is found by getting the categorical feature that yields the largest information gain for categorical targets. The trees in ID3 are grown to their maximum size. Pruning is applied to improve the ability of trees and to generalization of the unseen data.

C4.5

C4.5 is a successor to the ID3 model, which has removed the restriction that features need to be categorical. This is done by the dynamic definition of a discrete attribute partitioning the continuous attribute value into intervals of discrete sets. The c4.5 model converts the trees into a set of if-then rules. The accuracy of each if-then rule is evaluated, which determines the order in which they are applied. Pruning is performed by removing the precondition of the rule and checking if the accuracy of this rule improves without it.

C5.0

C5.0 is a modification to C4.5 and uses comparatively less memory to build smaller rulesets which are more accurate than C4.5.

CART

The logic of CART (Classification and Regression trees) is similar to C4.5 and differs in that it does not perform the computation of rule sets and supports numerical target variables. CART uses features and thresholds to construct the binary trees and yields the largest information gain at every node.

The depth of the decision tree is an important factor as the depth of the tree matters a lot in building a tree. This is done through the concepts of entropy and information gain.

Entropy

Entropy measures the uncertainty or impurity within the dataset and determines how a dataset is split through the decision tree. Entropy is also defined as the measure of disorder within the dataset, and the mathematical formula of entropy is:

Entropy is sometimes also denoted through ‘H’.

In the equation above, ‘pi’ is the probability of a class ‘i’ in the dataset.

‘i’ defined in the above equation can either be positive or negative depending on the class it belongs to.

Entropy is measured as a value between 0 and 1, where a value closer to 0 determines that the dataset is pure and contains fewer impurities. In contrast, a value closer to 1 means that the dataset is impure and has a high level of disorder. Sometimes the value of entropy can be greater than 1, which means that it has a very high level of disorder, and therefore, for the sake of simplicity, we can remember it as entropy between 0 and 1.

Entropy is calculated to measure the uncertainty or disorder present in the dataset, and the goal of finding entropy is to remove these disorders and reduce the amount of uncertainty present in the dataset. 

Information Gain

If the uncertainty present in the dataset is measured, then we need to measure the reduction of uncertainty in the target class of the dataset with the features available to us.

Information gain is used for measuring the amount of information a feature provides about a class and helps in determining the order of attributes.

Information gain is also known as Kullback-Leibler divergence and is denoted as IG(S, A), where S denotes the set and is the effective change in entropy after the decision has been made regarding a particular attribute denoted by A. Hence, information gain measures the relative change occurring in entropy as per the independent variable. The equation for information gain is given by:

https://www.dataspoof.info/post/decision-tree-classification-in-r/

Comments

Popular posts from this blog

what is polynomial regression

advantages and disadvantages of decision tree

decision tree in r