Exercise 4

  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Exercise 4 as PDF for free.

More details

  • Words: 949
  • Pages: 3
SGN-2556 Pattern Recognition

Computer exercise 4 Tree-based classifiers 25.04.2006

It is natural and intuitive to classify a pattern through a sequence of questions, in which the next question asked depends on the answer to the current question. Such a sequence of questions is displayed in a directed decision tree or simply tree, where by convention the first or root node is displayed at the top, connected by successive (directional) links or branches to other nodes. These are similarly connected until we reach terminal or leaf nodes, which have no further link (see the figure below).

(a)

(b)

Figure 1: Classification in a basic decision tree proceeds from top to bottom (a). Two-dimensional two-category example with decision regions marked R1 and R2 (right). Decision boundaries are perpendicular to the feature axes. Much of the work in designing trees focuses on deciding which property test or query should be performed at each node. For nonmetric data one might well consider logical combinations of properties, such as using (size=medium)AND (NOT(color=yellow))? as a query. For numerical data, there is a simple way to visualize the decision boundaries that are produced by decision trees. Suppose that the query at each node has the form “Is xi > xj ? This leads to hyperplane decision boundaries that are perpendicular to the coordinate axis and to decision regions of the form illustrated in Fig. 1. The fundamental principle underlying tree creation is that of simplicity. We seek a property query T at each node N that makes the data reaching the immediate descendent nodes as “pure as possible”. The most popular measure is the entropy impurity: X i(N ) = − P (ωj )log2 P (ωj ), (1) j

where P (ωj ) is the fraction of patterns at node N that are in category ωj . By the property of entropy, if all the patterns are of the same category, the impurity is 0, otherwise it is 1

positive, with the greatest value occuring when the different classes are equally likely. An obvious heuristic is to choose the query that decreases the impurity as much as possible. The drop in impurity is defined by ∆i(N ) = i(N ) − PL i(NL ) − (1 − PL )i(NR ),

(2)

where NL and NR are the left and right descendent nodes, i(NL ) and i(NR ) are their impurities, and PL is the fraction of patterns at node N that will go to NL when property query T is used. Then the best query is the choice for T that maximizes ∆i (T ).

1

CART (Classification And Regression Trees) method 1) Download the file CARTdata.mat that contains a vector of features (2 features, 16 samples taken from two classes) and a vector of targets. Visualize the data. 2) Train a binary CART tree using the entropy impurity. Use the CART.m function for that. 3) Using the test CART.m function and obtained decision tree, build a decision surface D over the range of given data. Plot the decision boundary superimposed on the data points (Hint: use the contour command to plot the contour of matrix D. Put the number of contour lines equal to 1 to get a single decision boundary). 4) Draw the decision tree for the given data as a block diagram (similarly to Fig.1(a)) using the structured array tree returned by CART.m function. Include the tree in your report. You can access a field (or a substructure) of a particular structure using the field structure followed by the field name (e.g. struct1.field1, struct1.substruct1.subsubstruct1). 5) In 2), a tree was grown fully (i.e. until no possible splits found). This typically leads to overfitting. Simplify the decision tree obtained in (4) by pruning all the neighboring leaf nodes (linked to a common antecedent node, one level above) whose elimination gives a very small increase in impurity (take for this particular example values less then 10−2 ). 6) Are there any other redundancies in the tree, which might be simplified? 7) Repeat step 2), using Gini and Misclassification impurity measures. Compare the performance of three different measures in terms of complexity of the decision tree (number of nodes), and in terms of misclassification rate for a given dataset. 8) Consider the nonmetric data from the file letters.mat sampled from three categories and consisting of five features and twenty patterns (see figure below). Train a tree for this data using the entropy impurity. Check the misclassification rate for it using test CART letters.m function. 9) Train a tree only with patterns belonging to the first and second class. Simplify it and convert information in your tree into a single logical expression, which describes the first category. Repeat the same for the second category. (Hint: use the char command to convert integers into ASCII characters).

2

Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Category ω1 ω1 ω1 ω1 ω1 ω2 ω2 ω2 ω2 ω2 ω2 ω2 ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω3

A−D A B A B A B B B C C D B D A D D A D C D

E−G E E G G G F F E G G G F E E E F F E F F

3

H −J H I I H I I J I J J J I H H H J H J J H

K −L K L L K L L L L K L K L K K L L K L L L

M −N M M N M M M N N N M M M N N N N N M M M

Related Documents

Exercise 4
April 2020 12
Exercise 4
May 2020 8
Exercise 4
July 2020 8
Exercise 4
June 2020 9
Exercise 4
November 2019 22
Exercise Physiology 4
April 2020 8