Consider the decision trees shown in Figures (a) and (b). For each approach described below, you...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training and validation data sets are shown in Figures (c) and (d), respectively. + 0 B A A 0 C B 1 0 1 0 + + B 0 + (a) Decision Tree A (b) Decision Tree B #123 0 3 OOOD A 0 B00 C Class 0 0 + 11 0 0 + 12 0 1 0 - 13 4567 0 1 1 14 1 0 0 + 15 #12345 AO A 0 BO B 0 00 C Class 0 + 0 0 1 - 0 1 0 - 0 1 1 - 0 1 1 + 1 0 1 + 16 1 0 0 - 1 0 1 - 17 1 0 0 8 1 1 0 - 18 1 0 1 - 90 1 1 0 - 19 1 1 0 - 10 1 1 1 + 20 1 1 1 + (c) Training data (c) Validation data (a) Optimistic approach (assumes generalization error is given by the training error). (b) Reduced error pruning approach (generalization error is computed using the validation set shown in Figure (d)), which means generalization error = validation error. (c) Pessimistic error approach, where generalization error = training error + omega * complexity of the tree. Consider omega = 1.0 and complexity of the tree = number of leaves / total number of training points. (d) minimum description length (MDL) approach. The total description length of a tree is given by: Cost(tree, data)=Cost(tree) + Cost(data tree); Each internal node of the tree is encoded by the ID of the splitting attribute. If there are m attributes, the cost of encoding each attribute is log2m bits. Each leaf node is encoded using the ID of the class it is associated with. If there are k classes, the cost of encoding a class is log2 k bits. Cost(tree) is the cost of encoding all the nodes in the tree. To simplify the computation, you can assume that the total cost of the tree is obtained by adding up the costs of encoding each internal node and each leaf node. Cost(data tree) is encoded using the classification errors the tree commits on the training set. Each error is encoded by log2 n bits, where n is the total number of training examples. Consider the decision trees shown in Figures (a) and (b). For each approach described below, you need to compute the generalization errors for both trees and decide which tree is better. The training and validation data sets are shown in Figures (c) and (d), respectively. + 0 B A A 0 C B 1 0 1 0 + + B 0 + (a) Decision Tree A (b) Decision Tree B #123 0 3 OOOD A 0 B00 C Class 0 0 + 11 0 0 + 12 0 1 0 - 13 4567 0 1 1 14 1 0 0 + 15 #12345 AO A 0 BO B 0 00 C Class 0 + 0 0 1 - 0 1 0 - 0 1 1 - 0 1 1 + 1 0 1 + 16 1 0 0 - 1 0 1 - 17 1 0 0 8 1 1 0 - 18 1 0 1 - 90 1 1 0 - 19 1 1 0 - 10 1 1 1 + 20 1 1 1 + (c) Training data (c) Validation data (a) Optimistic approach (assumes generalization error is given by the training error). (b) Reduced error pruning approach (generalization error is computed using the validation set shown in Figure (d)), which means generalization error = validation error. (c) Pessimistic error approach, where generalization error = training error + omega * complexity of the tree. Consider omega = 1.0 and complexity of the tree = number of leaves / total number of training points. (d) minimum description length (MDL) approach. The total description length of a tree is given by: Cost(tree, data)=Cost(tree) + Cost(data tree); Each internal node of the tree is encoded by the ID of the splitting attribute. If there are m attributes, the cost of encoding each attribute is log2m bits. Each leaf node is encoded using the ID of the class it is associated with. If there are k classes, the cost of encoding a class is log2 k bits. Cost(tree) is the cost of encoding all the nodes in the tree. To simplify the computation, you can assume that the total cost of the tree is obtained by adding up the costs of encoding each internal node and each leaf node. Cost(data tree) is encoded using the classification errors the tree commits on the training set. Each error is encoded by log2 n bits, where n is the total number of training examples.
Expert Answer:
Answer rating: 100% (QA)
To compute the generalization errors for both Decision Tree A and Decision Tree B and decide which tree is better we will apply various approaches as described starting with the optimistic approach an... View the full answer
Related Book For
Introduction to Data Mining
ISBN: 978-0321321367
1st edition
Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar
Posted Date:
Students also viewed these algorithms questions
-
Ramon has the following state tax payments in 2023. How much can he deduct on his 2023 return? State gift taxes $1,200 State income taxes withheld from w-2 $2,000 Real estate taxes on primary...
-
Consider the decision trees shown in Figure 4.3. Assume they are generated from a data set that contains 16 binary attributes and 3 classes, C1, C2, and C3. Compute the total description length of...
-
Q1. You have identified a market opportunity for home media players that would cater for older members of the population. Many older people have difficulty in understanding the operating principles...
-
State whether or not each of the following events would result in a liability being recognised in the accounts at 30 June. 1. Taxes for the year ended 30 June, which are not payable until October. 2....
-
Bear Tracks, Inc., has current assets of $2,030, net fixed assets of $9,780, current liabilities of $1,640, and long-term debt of $4,490. What is the value of the shareholders' equity account for...
-
Give examples of two acidic oxides. Write equations illustrating the formation of each oxide from its component elements. Write another chemical equation that illustrates the acidic character of each...
-
Asda is a U.K. supermarket chain that is a subsidiary of Wal-Mart Inc., a U.S. company. Wal-Mart's fiscal year ends January 31. On Feb- ruary 1, 2015, Asda reports facilities with original cost of 30...
-
Aunt Mollys Old Fashioned Cookies bakes cookies for retail stores. The companys best-selling cookie is chocolate nut supreme, which is marketed as a gourmet cookie and regularly sells for $8.00 per...
-
The Tyler Oil Company??s capital structure is as follows: Debt35% Preferred stock30 Common equity35The aftertax cost of debt is 10 percent; the cost of preferredstock 2 answers
-
In this global age of information, suggest which threats are posed to the principles of confidentiality and privacy, related to offshore outsourcing of various information systems functions. Provide...
-
Compare the market success (or lack thereof) relative to its competitors. For example, Tesla, Lightyear, Sono Motors,Stellantis. This comparison is interesting at the overall level, as well as some...
-
Write a MATLAB script to visualize the trajectory of a projectile launched from a hill. Consider a projectile launched with an initial velocity V _ 0 ? at an angle \ theta to the horizontal from a...
-
Answer the following Questions : 1) 2) 3) 4) Find the pressure at 40,000 feet altitude in standard atmosphere. Given: Required: h = 40,000 ft. Pressure (P) @ STD Atmosphere
-
Grand Clothing is a manufacturer of designer suits. The cost of each suit is the sum of three variable costs (direct materials costs, direct manufacturing labour costs, and manufacturing overhead...
-
The Reynolds number, pVD/ is a very important parameter in fluid mechanics. Determine its value for ethyl alcohol flowing at a velocity of 3 m/s through a 2-in.-diameter pipe. Use = 1.19 10-3 PVD/ =...
-
Data since 1977 of the monthly return to investors that have invested their money in fund X, labeled variable RX. Size of the fund, that is the amount of capital that investors have jointly put in...
-
What is the ideal number of children to have? This question was asked on the Sullivan Statistics Survey I. Draw a dot plot of the variable Children from theSullivanStatsSurveyI data set at...
-
Distinguish between noise and outliers. Be sure to consider the following questions. (a) Is noise ever interesting or desirable? Outliers? (b) Can noise objects be outliers? (c) Are noise objects...
-
Traditional agglomerative hierarchical clustering routines merge two clusters at each step. Does it seem likely that such an approach accurately captures the (nested) cluster structure of a set of...
-
Consider the one-dimensional data set shown in Table 5.4. (a) Classify the data point x = 5.0 according to its 1-, 3-, 5-, and 9-nearest neighbors (using majority vote). (b) Repeat the previous...
-
Show that the decomposition (10.37) of the nonlinear term is correct. Use direct substitution of (10.36) into the expression for one component of vector \(N\).
-
For the flow in Problem 7, write the boundary conditions for pressure when the flow is incompressible and inviscid and there is no body force.
-
If your course involves exercises with a CFD code, study the manual to determine which of the projection schemes discussed in Section 10.4 (SIMPLE, SIMPLEC, SIMPLER, PISO) are implemented. Are there...
Study smarter with the SolutionInn App