Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Exercise 5.7 (Decision trees, regularization; Section 5.6) Pruning a decision tree: Let S be a labeled sample drawn iid from some distribution D over 0,1),

image text in transcribed

Exercise 5.7 (Decision trees, regularization; Section 5.6) Pruning a decision tree: Let S be a labeled sample drawn iid from some distribution D over 0,1), and suppose we have used S to create some decision tree T. However, the tree T is large, and we are concerned we might be overfitting. Give a polynomial-time algorithm for pruning T that finds the pruning h of T that optimizes the right-hand-side of Corollary 5.8, i.e., that for a given ? > 0 minimizes errs(h) +?l size(h) In(4) + In(2/6) 21S To discuss this, we need to define what we mean by a "pruning" of T and what we mean by the "size" of h. A pruning h of T is a tree in which some internal nodes of T have been turned into leaves, labeled "+" or "depending on whether the majority of eramples in S that reach that node are positive or negative. Let size(h) = L(h) log(n) where L(h) is the number of leaves in h Hint # 1: it is sufficient, for each integer L = 1,2, ,L(T), to find the pruning of T with L leaves of lowest empirical error on S, that is, hL = argminh:L(h)-Lerrs(h). Then you can just plug them all into the displayed formula above and pick the best one Hint #2: use dynamic programming Exercise 5.7 (Decision trees, regularization; Section 5.6) Pruning a decision tree: Let S be a labeled sample drawn iid from some distribution D over 0,1), and suppose we have used S to create some decision tree T. However, the tree T is large, and we are concerned we might be overfitting. Give a polynomial-time algorithm for pruning T that finds the pruning h of T that optimizes the right-hand-side of Corollary 5.8, i.e., that for a given ? > 0 minimizes errs(h) +?l size(h) In(4) + In(2/6) 21S To discuss this, we need to define what we mean by a "pruning" of T and what we mean by the "size" of h. A pruning h of T is a tree in which some internal nodes of T have been turned into leaves, labeled "+" or "depending on whether the majority of eramples in S that reach that node are positive or negative. Let size(h) = L(h) log(n) where L(h) is the number of leaves in h Hint # 1: it is sufficient, for each integer L = 1,2, ,L(T), to find the pruning of T with L leaves of lowest empirical error on S, that is, hL = argminh:L(h)-Lerrs(h). Then you can just plug them all into the displayed formula above and pick the best one Hint #2: use dynamic programming

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Big Data Systems A 360-degree Approach

Authors: Jawwad ShamsiMuhammad Khojaye

1st Edition

0429531575, 9780429531576

More Books

Students also viewed these Databases questions

Question

What are Measures in OLAP Cubes?

Answered: 1 week ago

Question

How do OLAP Databases provide for Drilling Down into data?

Answered: 1 week ago

Question

How are OLAP Cubes different from Production Relational Databases?

Answered: 1 week ago