Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified

1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). ACUSTOME STATE CU9823 CA CU14284 NY CU5938 CA CU1069 MN CU11717 NY CU5928 NY CU10012 NM CU197 MI CU476 CANY NY NY NY NY MI CA WI NY NY NY NY CA MI NY CA CA CA NY WI NorthEast DIVORCEDMEDIUM Yes NorthEast

1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There are 40 records. All the calculations should show the details. You have known that the gain of a split is defined as follows: Gain Impurity (parent) -Impurity(i) n i=1 The impurity measure can be Gini index, entropy, and classification error rate. In splitting, maximizing the gain is equivalent to minimizing the weighted average of the impurity measure of the child nodes. A. Gini index i. Calculate the Gini index for the dataset excerpt without any partitioning. ii. Calculate the gain in Gini index for CUSTOMER_ID using multi-way split. iii. Calculate the gain in Gini index for STATE using multi-way split. iv. Calculate the gain in Gini index for REGION using multi-way split. v. Calculate the gain in Gini index for MARITAL_STATUS using multi-way split. vi. Calculate the gain in Gini index for LTV_BIN using multi-way split. vii. Calculate the gain in Gini index for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on the gain in Gini index? ix. Compute a 2-level decision tree using the gain in Gini index. B. Entropy i. Calculate the entropy for the dataset excerpt without any partitioning. ii. Calculate the information gain for CUSTOMER_ID using multi-way split. iii. Calculate the information gain for STATE using multi-way split. iv. Calculate the information gain for REGION using multi-way split. v. Calculate the information gain for MARITAL_STATUS using multi-way split. vi. Calculate the information gain for LTV_BIN using multi-way split. vii. Calculate the information gain for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on information gain? ix. Compute a 2-level decision tree using information gain. C. Classification error rate i. Calculate the error rate without any partitioning. ii. Calculate the gain in error rate for CUSTOMER_ID using multi-way split iii. Calculate the gain in error rate for STATE using multi-way split. iv. Calculate the gain in error rate for REGION using multi-way split. V. Calculate the gain in error rate for MARITAL_STATUS using multi-way split. vi. Calculate the gain in error rate for LTV_BIN using multi-way split. vii. viii. ix. Calculate the gain in error rate for BANK_FUND for every possible split. Which split is best? Which attribute is the best for splitting based on the gain in error rate? Compute a 2-level decision tree using the gain in error rate. CUSTOME STATE REGION MARITAL STATUS LTV BIN BANK_FUNDS BUY INSURANCE CU9823 CA West SINGLE MEDIUM 0 No CU14284 NY NorthEast SINGLE HIGH 0 No CU5938 CA West SINGLE MEDIUM 0 No CU1069 MN West SINGLE MEDIUM 0 No CU11717 NY NorthEast DIVORCED HIGH 0 No CU5928 NY NorthEast DIVORCED HIGH 0 No CU10012 NM Southwest MARRIED HIGH 0 No CU197 MI Midwest SINGLE HIGH 0 No CU476 CA West MARRIED HIGH 0 No CU9110 DC NorthEast DIVORCED HIGH 0 No CU14921 NY NorthEast SINGLE HIGH 340 Yes CU12175 MI Midwest MARRIED MEDIUM 500 No CU12658 CA West SINGLE MEDIUM 500 Yes CU14620 UT Southwest MARRIED HIGH 500 No CU15186 NY NorthEast MARRIED HIGH 600 No CU7924 NY NorthEast SINGLE MEDIUM 650 No CU7148 OK Midwest MARRIED HIGH 750 No CU14052 CA West MARRIED HIGH 750 Yes CU7502 MI Midwest SINGLE LOW 750 Yes CU14911 CA West MARRIED HIGH 750 Yes CU15786 WI Midwest SINGLE HIGH 750 Yes CU8318 NY NorthEast DIVORCED HIGH 1500 Yes CU12738 NY NorthEast DIVORCED HIGH 1500 Yes CU13543 WA West DIVORCED VERY HIGH 1500 No CU5165 MI Midwest MARRIED MEDIUM 2400 No CU5082 CA West MARRIED HIGH 2400 Yes CU4679 CA West DIVORCED HIGH 3000 Yes CU13803 NY NorthEast MARRIED VERY HIGH 3000 Yes CU8340 NY NorthEast DIVORCED HIGH 4000 Yes CU3214 NY NorthEast DIVORCED HIGH 4500 Yes CU2394 NY NorthEast DIVORCED HIGH 4500 Yes CU1691 NY NorthEast DIVORCED MEDIUM 4500 Yes CU7291 MI Midwest SINGLE MEDIUM 5000 No CU3296 CA West WIDOWED MEDIUM 10000 No CU3654 NY NorthEast WIDOWED HIGH 10000 Yes CU5675 NY NorthEast SINGLE LOW 10000 Yes CU4285 NY NorthEast MARRIED MEDIUM 10000 Yes CU8589 WI Midwest WIDOWED HIGH 16000 No CU9004 WI Midwest MARRIED MEDIUM 16000 Yes CU2399 MI Midwest MARRIED HIGH 20000 Yes NY NorthEast DIVORCEDMEDIUM Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast SINGLE HIGH Yes NY NorthEast SINGLE LOW Yes MI Midwest MARRIED HIGH Yes CA West SINGLE MEDIUM Yes WI Midwest MARRIED MEDIUM Yes NY NorthEast MARRIED VERY HIGHYES NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast WIDOWEDHIGH Yes CA West MARRIED HIGH Yes MI Midwest SINGLE LOW Yes NY NorthEast MARRIED MEDIUM Yes CA West DIVORCEDHIGH Yes CA West MARRIED HIGH Yes CA West MARRIED HIGH Yes NY NorthEast DIVORCEDHIGH Yes WI Midwest SINGLE HIGH Yes

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Gini Index Lets proceed with calculating the Gini index and gain in Gini index for each of the specified attributes Well start with part i i Calculate the Gini index for the dataset excerpt without an... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

South-Western Federal Taxation 2020 Comprehensive

Authors: David M. Maloney, William A. Raabe, James C. Young, Annette Nellen, William H. Hoffman

43rd Edition

357109147, 978-0357109144

More Books

Students also viewed these Programming questions