1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There are 40 records. All the calculations should show the details. You have known that the gain of a split is defined as follows: Gain Impurity (parent) -Impurity(i) n i=1 The impurity measure can be Gini index, entropy, and classification error rate. In splitting, maximizing the gain is equivalent to minimizing the weighted average of the impurity measure of the child nodes. A. Gini index i. Calculate the Gini index for the dataset excerpt without any partitioning. ii. Calculate the gain in Gini index for CUSTOMER_ID using multi-way split. iii. Calculate the gain in Gini index for STATE using multi-way split. iv. Calculate the gain in Gini index for REGION using multi-way split. v. Calculate the gain in Gini index for MARITAL_STATUS using multi-way split. vi. Calculate the gain in Gini index for LTV_BIN using multi-way split. vii. Calculate the gain in Gini index for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on the gain in Gini index? ix. Compute a 2-level decision tree using the gain in Gini index. B. Entropy i. Calculate the entropy for the dataset excerpt without any partitioning. ii. Calculate the information gain for CUSTOMER_ID using multi-way split. iii. Calculate the information gain for STATE using multi-way split. iv. Calculate the information gain for REGION using multi-way split. v. Calculate the information gain for MARITAL_STATUS using multi-way split. vi. Calculate the information gain for LTV_BIN using multi-way split. vii. Calculate the information gain for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on information gain? ix. Compute a 2-level decision tree using information gain. C. Classification error rate i. Calculate the error rate without any partitioning. ii. Calculate the gain in error rate for CUSTOMER_ID using multi-way split iii. Calculate the gain in error rate for STATE using multi-way split. iv. Calculate the gain in error rate for REGION using multi-way split. V. Calculate the gain in error rate for MARITAL_STATUS using multi-way split. vi. Calculate the gain in error rate for LTV_BIN using multi-way split. vii. viii. ix. Calculate the gain in error rate for BANK_FUND for every possible split. Which split is best? Which attribute is the best for splitting based on the gain in error rate? Compute a 2-level decision tree using the gain in error rate. CUSTOME STATE REGION MARITAL STATUS LTV BIN BANK_FUNDS BUY INSURANCE CU9823 CA West SINGLE MEDIUM 0 No CU14284 NY NorthEast SINGLE HIGH 0 No CU5938 CA West SINGLE MEDIUM 0 No CU1069 MN West SINGLE MEDIUM 0 No CU11717 NY NorthEast DIVORCED HIGH 0 No CU5928 NY NorthEast DIVORCED HIGH 0 No CU10012 NM Southwest MARRIED HIGH 0 No CU197 MI Midwest SINGLE HIGH 0 No CU476 CA West MARRIED HIGH 0 No CU9110 DC NorthEast DIVORCED HIGH 0 No CU14921 NY NorthEast SINGLE HIGH 340 Yes CU12175 MI Midwest MARRIED MEDIUM 500 No CU12658 CA West SINGLE MEDIUM 500 Yes CU14620 UT Southwest MARRIED HIGH 500 No CU15186 NY NorthEast MARRIED HIGH 600 No CU7924 NY NorthEast SINGLE MEDIUM 650 No CU7148 OK Midwest MARRIED HIGH 750 No CU14052 CA West MARRIED HIGH 750 Yes CU7502 MI Midwest SINGLE LOW 750 Yes CU14911 CA West MARRIED HIGH 750 Yes CU15786 WI Midwest SINGLE HIGH 750 Yes CU8318 NY NorthEast DIVORCED HIGH 1500 Yes CU12738 NY NorthEast DIVORCED HIGH 1500 Yes CU13543 WA West DIVORCED VERY HIGH 1500 No CU5165 MI Midwest MARRIED MEDIUM 2400 No CU5082 CA West MARRIED HIGH 2400 Yes CU4679 CA West DIVORCED HIGH 3000 Yes CU13803 NY NorthEast MARRIED VERY HIGH 3000 Yes CU8340 NY NorthEast DIVORCED HIGH 4000 Yes CU3214 NY NorthEast DIVORCED HIGH 4500 Yes CU2394 NY NorthEast DIVORCED HIGH 4500 Yes CU1691 NY NorthEast DIVORCED MEDIUM 4500 Yes CU7291 MI Midwest SINGLE MEDIUM 5000 No CU3296 CA West WIDOWED MEDIUM 10000 No CU3654 NY NorthEast WIDOWED HIGH 10000 Yes CU5675 NY NorthEast SINGLE LOW 10000 Yes CU4285 NY NorthEast MARRIED MEDIUM 10000 Yes CU8589 WI Midwest WIDOWED HIGH 16000 No CU9004 WI Midwest MARRIED MEDIUM 16000 Yes CU2399 MI Midwest MARRIED HIGH 20000 Yes NY NorthEast DIVORCEDMEDIUM Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast SINGLE HIGH Yes NY NorthEast SINGLE LOW Yes MI Midwest MARRIED HIGH Yes CA West SINGLE MEDIUM Yes WI Midwest MARRIED MEDIUM Yes NY NorthEast MARRIED VERY HIGHYES NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast WIDOWEDHIGH Yes CA West MARRIED HIGH Yes MI Midwest SINGLE LOW Yes NY NorthEast MARRIED MEDIUM Yes CA West DIVORCEDHIGH Yes CA West MARRIED HIGH Yes CA West MARRIED HIGH Yes NY NorthEast DIVORCEDHIGH Yes WI Midwest SINGLE HIGH Yes 1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There are 40 records. All the calculations should show the details. You have known that the gain of a split is defined as follows: Gain Impurity (parent) -Impurity(i) n i=1 The impurity measure can be Gini index, entropy, and classification error rate. In splitting, maximizing the gain is equivalent to minimizing the weighted average of the impurity measure of the child nodes. A. Gini index i. Calculate the Gini index for the dataset excerpt without any partitioning. ii. Calculate the gain in Gini index for CUSTOMER_ID using multi-way split. iii. Calculate the gain in Gini index for STATE using multi-way split. iv. Calculate the gain in Gini index for REGION using multi-way split. v. Calculate the gain in Gini index for MARITAL_STATUS using multi-way split. vi. Calculate the gain in Gini index for LTV_BIN using multi-way split. vii. Calculate the gain in Gini index for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on the gain in Gini index? ix. Compute a 2-level decision tree using the gain in Gini index. B. Entropy i. Calculate the entropy for the dataset excerpt without any partitioning. ii. Calculate the information gain for CUSTOMER_ID using multi-way split. iii. Calculate the information gain for STATE using multi-way split. iv. Calculate the information gain for REGION using multi-way split. v. Calculate the information gain for MARITAL_STATUS using multi-way split. vi. Calculate the information gain for LTV_BIN using multi-way split. vii. Calculate the information gain for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on information gain? ix. Compute a 2-level decision tree using information gain. C. Classification error rate i. Calculate the error rate without any partitioning. ii. Calculate the gain in error rate for CUSTOMER_ID using multi-way split iii. Calculate the gain in error rate for STATE using multi-way split. iv. Calculate the gain in error rate for REGION using multi-way split. V. Calculate the gain in error rate for MARITAL_STATUS using multi-way split. vi. Calculate the gain in error rate for LTV_BIN using multi-way split. vii. viii. ix. Calculate the gain in error rate for BANK_FUND for every possible split. Which split is best? Which attribute is the best for splitting based on the gain in error rate? Compute a 2-level decision tree using the gain in error rate. CUSTOME STATE REGION MARITAL STATUS LTV BIN BANK_FUNDS BUY INSURANCE CU9823 CA West SINGLE MEDIUM 0 No CU14284 NY NorthEast SINGLE HIGH 0 No CU5938 CA West SINGLE MEDIUM 0 No CU1069 MN West SINGLE MEDIUM 0 No CU11717 NY NorthEast DIVORCED HIGH 0 No CU5928 NY NorthEast DIVORCED HIGH 0 No CU10012 NM Southwest MARRIED HIGH 0 No CU197 MI Midwest SINGLE HIGH 0 No CU476 CA West MARRIED HIGH 0 No CU9110 DC NorthEast DIVORCED HIGH 0 No CU14921 NY NorthEast SINGLE HIGH 340 Yes CU12175 MI Midwest MARRIED MEDIUM 500 No CU12658 CA West SINGLE MEDIUM 500 Yes CU14620 UT Southwest MARRIED HIGH 500 No CU15186 NY NorthEast MARRIED HIGH 600 No CU7924 NY NorthEast SINGLE MEDIUM 650 No CU7148 OK Midwest MARRIED HIGH 750 No CU14052 CA West MARRIED HIGH 750 Yes CU7502 MI Midwest SINGLE LOW 750 Yes CU14911 CA West MARRIED HIGH 750 Yes CU15786 WI Midwest SINGLE HIGH 750 Yes CU8318 NY NorthEast DIVORCED HIGH 1500 Yes CU12738 NY NorthEast DIVORCED HIGH 1500 Yes CU13543 WA West DIVORCED VERY HIGH 1500 No CU5165 MI Midwest MARRIED MEDIUM 2400 No CU5082 CA West MARRIED HIGH 2400 Yes CU4679 CA West DIVORCED HIGH 3000 Yes CU13803 NY NorthEast MARRIED VERY HIGH 3000 Yes CU8340 NY NorthEast DIVORCED HIGH 4000 Yes CU3214 NY NorthEast DIVORCED HIGH 4500 Yes CU2394 NY NorthEast DIVORCED HIGH 4500 Yes CU1691 NY NorthEast DIVORCED MEDIUM 4500 Yes CU7291 MI Midwest SINGLE MEDIUM 5000 No CU3296 CA West WIDOWED MEDIUM 10000 No CU3654 NY NorthEast WIDOWED HIGH 10000 Yes CU5675 NY NorthEast SINGLE LOW 10000 Yes CU4285 NY NorthEast MARRIED MEDIUM 10000 Yes CU8589 WI Midwest WIDOWED HIGH 16000 No CU9004 WI Midwest MARRIED MEDIUM 16000 Yes CU2399 MI Midwest MARRIED HIGH 20000 Yes NY NorthEast DIVORCEDMEDIUM Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast SINGLE HIGH Yes NY NorthEast SINGLE LOW Yes MI Midwest MARRIED HIGH Yes CA West SINGLE MEDIUM Yes WI Midwest MARRIED MEDIUM Yes NY NorthEast MARRIED VERY HIGHYES NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast WIDOWEDHIGH Yes CA West MARRIED HIGH Yes MI Midwest SINGLE LOW Yes NY NorthEast MARRIED MEDIUM Yes CA West DIVORCEDHIGH Yes CA West MARRIED HIGH Yes CA West MARRIED HIGH Yes NY NorthEast DIVORCEDHIGH Yes WI Midwest SINGLE HIGH Yes
Expert Answer:
Answer rating: 100% (QA)
Gini Index Lets proceed with calculating the Gini index and gain in Gini index for each of the specified attributes Well start with part i i Calculate the Gini index for the dataset excerpt without an... View the full answer
Related Book For
South-Western Federal Taxation 2020 Comprehensive
ISBN: 9780357109144
43rd Edition
Authors: David M. Maloney, William A. Raabe, James C. Young, Annette Nellen, William H. Hoffman
Posted Date:
Students also viewed these programming questions
-
Consider the following code: int M[128]; int num = 0; for (int i=0; i <64; i++) { } num += M[i] * M[i + 64] Assume Array M begins at address 0 and the cache begins empty. Variables i and num are...
-
Read the case study "Southwest Airlines," found in Part 2 of your textbook. Review the "Guide to Case Analysis" found on pp. CA1 - CA11 of your textbook. (This guide follows the last case in the...
-
The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...
-
In Exercises 7683, use a graphing utility to graph the function. Use the graph to determine whether the function has an inverse that is a function (that is, whether the function is one-to-one). f(x)...
-
What is the estimation formula for a one-sample z test?
-
Is acceleration directly proportional to mass, or is it inversely proportional to mass? Give an example.
-
On December 31, 20X2, the accounting records of Zylar, Inc., show the following unit sales for 20X2. The following are additional actual amounts for the 4th quarter of 20X2. Management has...
-
Charles Choi was the owner/ operator of a grocery store in California called Genes Modern Market. In addition to the sale of normal grocery items, the store cashed payroll, personal, and third- party...
-
Glamour Accessories is considering an equipment investment that will cost $920,000. Projected net cash inflows over the equipment's three-year life are as follows: Year 1: $494,000; Year 2: $388,000;...
-
1. Explain the legal basis for a cause of action against an auditor. What are the defenses available to the auditor to rebut such charges? How does adherence to the ethical standards of the...
-
Convert meters to miles. Convert seconds to hours. The driver should slow down because hes exceeding the speed limit. Repeat the conversion, using the relationship 1.00 m/s = 2.24 mi/h. Why is the...
-
R Co. is involved in the evaluation of a new computer-integrated manufacturing system. The system has a projected initial cost of P1,000,000. It has an expected life of six years, with no salvage...
-
Give a Theory on cost estimation, top down estimation and bottom down estimation - purpose and how it is useful (with reference) 2) theory on managing risks and identify 3 negative risks and provide...
-
Aquaguard manufactures three models of water purifiers in three separate plants in Taiwan. These plants serve the demand in Europe. All three models sell at a unit price of $100 and the holding cost...
-
Many hygiene factors protect against employee and job dissatisfaction, and the motivators promote employee and job satisfaction. Describe a time when your hygiene factors influenced your motivators...
-
Discuss the potential consequences to others (employer, supervisor, team leader, manager) and the business for not following WHS policies and procedures ?
-
On 12/12/2020 smith inc purchased merchandise on account from suppliers. On 12/20/2020 smith inc paid its suppliers for half of the merchandise purchased. Smith inc did not record either of these...
-
What tools are available to help shoppers compare prices, features, and values and check other shoppers opinions?
-
Assume that a partnership is profitable and that its tax year ends on December 31 but one of the partners' tax year ends on September 30. Does the partner enjoy a tax benefit or detriment from the...
-
For your state and one of its neighbors. find the following income tax rules. Place your data in a chart, and e-mail your findings to your instructor. a. To what extent does each state follow the...
-
What are the similarities between the crop method used for farming and the completed contract method used for long-term construction?
-
(a) Prove that \(\sqrt{3}\) is irrational. (b) Prove that there are no rationals \(r, s\) such that \(\sqrt{3}=r+s \sqrt{2}\).
-
Prove that if \(n\) is any positive integer, then \(\sqrt{n}+\sqrt{2}\) is irrational.
-
Critic Ivor Smallbrain is watching the horror movie Salamanders on a Desert Island. In the film, there are 30 salamanders living on a desert island: 15 are red, 7 blue and 8 green. When two of a...
Study smarter with the SolutionInn App