1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There are 40 records. All the calculations should show the details. You have known that the gain of a split is defined as follows: Gain Impurity (parent) -Impurity(i) n i=1 The impurity measure can be Gini index, entropy, and classification error rate. In splitting, maximizing the gain is equivalent to minimizing the weighted average of the impurity measure of the child nodes. A. Gini index i. Calculate the Gini index for the dataset excerpt without any partitioning. ii. Calculate the gain in Gini index for CUSTOMER_ID using multi-way split. iii. Calculate the gain in Gini index for STATE using multi-way split. iv. Calculate the gain in Gini index for REGION using multi-way split. v. Calculate the gain in Gini index for MARITAL_STATUS using multi-way split. vi. Calculate the gain in Gini index for LTV_BIN using multi-way split. vii. Calculate the gain in Gini index for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on the gain in Gini index? ix. Compute a 2-level decision tree using the gain in Gini index. B. Entropy i. Calculate the entropy for the dataset excerpt without any partitioning. ii. Calculate the information gain for CUSTOMER_ID using multi-way split. iii. Calculate the information gain for STATE using multi-way split. iv. Calculate the information gain for REGION using multi-way split. v. Calculate the information gain for MARITAL_STATUS using multi-way split. vi. Calculate the information gain for LTV_BIN using multi-way split. vii. Calculate the information gain for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on information gain? ix. Compute a 2-level decision tree using information gain. C. Classification error rate i. Calculate the error rate without any partitioning. ii. Calculate the gain in error rate for CUSTOMER_ID using multi-way split iii. Calculate the gain in error rate for STATE using multi-way split. iv. Calculate the gain in error rate for REGION using multi-way split. V. Calculate the gain in error rate for MARITAL_STATUS using multi-way split. vi. Calculate the gain in error rate for LTV_BIN using multi-way split. vii. viii. ix. Calculate the gain in error rate for BANK_FUND for every possible split. Which split is best? Which attribute is the best for splitting based on the gain in error rate? Compute a 2-level decision tree using the gain in error rate. CUSTOME STATE REGION MARITAL STATUS LTV BIN BANK_FUNDS BUY INSURANCE CU9823 CA West SINGLE MEDIUM 0 No CU14284 NY NorthEast SINGLE HIGH 0 No CU5938 CA West SINGLE MEDIUM 0 No CU1069 MN West SINGLE MEDIUM 0 No CU11717 NY NorthEast DIVORCED HIGH 0 No CU5928 NY NorthEast DIVORCED HIGH 0 No CU10012 NM Southwest MARRIED HIGH 0 No CU197 MI Midwest SINGLE HIGH 0 No CU476 CA West MARRIED HIGH 0 No CU9110 DC NorthEast DIVORCED HIGH 0 No CU14921 NY NorthEast SINGLE HIGH 340 Yes CU12175 MI Midwest MARRIED MEDIUM 500 No CU12658 CA West SINGLE MEDIUM 500 Yes CU14620 UT Southwest MARRIED HIGH 500 No CU15186 NY NorthEast MARRIED HIGH 600 No CU7924 NY NorthEast SINGLE MEDIUM 650 No CU7148 OK Midwest MARRIED HIGH 750 No CU14052 CA West MARRIED HIGH 750 Yes CU7502 MI Midwest SINGLE LOW 750 Yes CU14911 CA West MARRIED HIGH 750 Yes CU15786 WI Midwest SINGLE HIGH 750 Yes CU8318 NY NorthEast DIVORCED HIGH 1500 Yes CU12738 NY NorthEast DIVORCED HIGH 1500 Yes CU13543 WA West DIVORCED VERY HIGH 1500 No CU5165 MI Midwest MARRIED MEDIUM 2400 No CU5082 CA West MARRIED HIGH 2400 Yes CU4679 CA West DIVORCED HIGH 3000 Yes CU13803 NY NorthEast MARRIED VERY HIGH 3000 Yes CU8340 NY NorthEast DIVORCED HIGH 4000 Yes CU3214 NY NorthEast DIVORCED HIGH 4500 Yes CU2394 NY NorthEast DIVORCED HIGH 4500 Yes CU1691 NY NorthEast DIVORCED MEDIUM 4500 Yes CU7291 MI Midwest SINGLE MEDIUM 5000 No CU3296 CA West WIDOWED MEDIUM 10000 No CU3654 NY NorthEast WIDOWED HIGH 10000 Yes CU5675 NY NorthEast SINGLE LOW 10000 Yes CU4285 NY NorthEast MARRIED MEDIUM 10000 Yes CU8589 WI Midwest WIDOWED HIGH 16000 No CU9004 WI Midwest MARRIED MEDIUM 16000 Yes CU2399 MI Midwest MARRIED HIGH 20000 Yes NY NorthEast DIVORCEDMEDIUM Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast SINGLE HIGH Yes NY NorthEast SINGLE LOW Yes MI Midwest MARRIED HIGH Yes CA West SINGLE MEDIUM Yes WI Midwest MARRIED MEDIUM Yes NY NorthEast MARRIED VERY HIGHYES NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast WIDOWEDHIGH Yes CA West MARRIED HIGH Yes MI Midwest SINGLE LOW Yes NY NorthEast MARRIED MEDIUM Yes CA West DIVORCEDHIGH Yes CA West MARRIED HIGH Yes CA West MARRIED HIGH Yes NY NorthEast DIVORCEDHIGH Yes WI Midwest SINGLE HIGH Yes 1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There are 40 records. All the calculations should show the details. You have known that the gain of a split is defined as follows: Gain Impurity (parent) -Impurity(i) n i=1 The impurity measure can be Gini index, entropy, and classification error rate. In splitting, maximizing the gain is equivalent to minimizing the weighted average of the impurity measure of the child nodes. A. Gini index i. Calculate the Gini index for the dataset excerpt without any partitioning. ii. Calculate the gain in Gini index for CUSTOMER_ID using multi-way split. iii. Calculate the gain in Gini index for STATE using multi-way split. iv. Calculate the gain in Gini index for REGION using multi-way split. v. Calculate the gain in Gini index for MARITAL_STATUS using multi-way split. vi. Calculate the gain in Gini index for LTV_BIN using multi-way split. vii. Calculate the gain in Gini index for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on the gain in Gini index? ix. Compute a 2-level decision tree using the gain in Gini index. B. Entropy i. Calculate the entropy for the dataset excerpt without any partitioning. ii. Calculate the information gain for CUSTOMER_ID using multi-way split. iii. Calculate the information gain for STATE using multi-way split. iv. Calculate the information gain for REGION using multi-way split. v. Calculate the information gain for MARITAL_STATUS using multi-way split. vi. Calculate the information gain for LTV_BIN using multi-way split. vii. Calculate the information gain for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on information gain? ix. Compute a 2-level decision tree using information gain. C. Classification error rate i. Calculate the error rate without any partitioning. ii. Calculate the gain in error rate for CUSTOMER_ID using multi-way split iii. Calculate the gain in error rate for STATE using multi-way split. iv. Calculate the gain in error rate for REGION using multi-way split. V. Calculate the gain in error rate for MARITAL_STATUS using multi-way split. vi. Calculate the gain in error rate for LTV_BIN using multi-way split. vii. viii. ix. Calculate the gain in error rate for BANK_FUND for every possible split. Which split is best? Which attribute is the best for splitting based on the gain in error rate? Compute a 2-level decision tree using the gain in error rate. CUSTOME STATE REGION MARITAL STATUS LTV BIN BANK_FUNDS BUY INSURANCE CU9823 CA West SINGLE MEDIUM 0 No CU14284 NY NorthEast SINGLE HIGH 0 No CU5938 CA West SINGLE MEDIUM 0 No CU1069 MN West SINGLE MEDIUM 0 No CU11717 NY NorthEast DIVORCED HIGH 0 No CU5928 NY NorthEast DIVORCED HIGH 0 No CU10012 NM Southwest MARRIED HIGH 0 No CU197 MI Midwest SINGLE HIGH 0 No CU476 CA West MARRIED HIGH 0 No CU9110 DC NorthEast DIVORCED HIGH 0 No CU14921 NY NorthEast SINGLE HIGH 340 Yes CU12175 MI Midwest MARRIED MEDIUM 500 No CU12658 CA West SINGLE MEDIUM 500 Yes CU14620 UT Southwest MARRIED HIGH 500 No CU15186 NY NorthEast MARRIED HIGH 600 No CU7924 NY NorthEast SINGLE MEDIUM 650 No CU7148 OK Midwest MARRIED HIGH 750 No CU14052 CA West MARRIED HIGH 750 Yes CU7502 MI Midwest SINGLE LOW 750 Yes CU14911 CA West MARRIED HIGH 750 Yes CU15786 WI Midwest SINGLE HIGH 750 Yes CU8318 NY NorthEast DIVORCED HIGH 1500 Yes CU12738 NY NorthEast DIVORCED HIGH 1500 Yes CU13543 WA West DIVORCED VERY HIGH 1500 No CU5165 MI Midwest MARRIED MEDIUM 2400 No CU5082 CA West MARRIED HIGH 2400 Yes CU4679 CA West DIVORCED HIGH 3000 Yes CU13803 NY NorthEast MARRIED VERY HIGH 3000 Yes CU8340 NY NorthEast DIVORCED HIGH 4000 Yes CU3214 NY NorthEast DIVORCED HIGH 4500 Yes CU2394 NY NorthEast DIVORCED HIGH 4500 Yes CU1691 NY NorthEast DIVORCED MEDIUM 4500 Yes CU7291 MI Midwest SINGLE MEDIUM 5000 No CU3296 CA West WIDOWED MEDIUM 10000 No CU3654 NY NorthEast WIDOWED HIGH 10000 Yes CU5675 NY NorthEast SINGLE LOW 10000 Yes CU4285 NY NorthEast MARRIED MEDIUM 10000 Yes CU8589 WI Midwest WIDOWED HIGH 16000 No CU9004 WI Midwest MARRIED MEDIUM 16000 Yes CU2399 MI Midwest MARRIED HIGH 20000 Yes NY NorthEast DIVORCEDMEDIUM Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast SINGLE HIGH Yes NY NorthEast SINGLE LOW Yes MI Midwest MARRIED HIGH Yes CA West SINGLE MEDIUM Yes WI Midwest MARRIED MEDIUM Yes NY NorthEast MARRIED VERY HIGHYES NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast WIDOWEDHIGH Yes CA West MARRIED HIGH Yes MI Midwest SINGLE LOW Yes NY NorthEast MARRIED MEDIUM Yes CA West DIVORCEDHIGH Yes CA West MARRIED HIGH Yes CA West MARRIED HIGH Yes NY NorthEast DIVORCEDHIGH Yes WI Midwest SINGLE HIGH Yes
Expert Answer:
Answer rating: 100% (QA)
Gini Index Lets proceed with calculating the Gini index and gain in Gini index for each of the specified attributes Well start with part i i Calculate the Gini index for the dataset excerpt without an... View the full answer
Related Book For
South-Western Federal Taxation 2020 Comprehensive
ISBN: 9780357109144
43rd Edition
Authors: David M. Maloney, William A. Raabe, James C. Young, Annette Nellen, William H. Hoffman
Posted Date:
Students also viewed these programming questions
-
Consider the following code: int M[128]; int num = 0; for (int i=0; i <64; i++) { } num += M[i] * M[i + 64] Assume Array M begins at address 0 and the cache begins empty. Variables i and num are...
-
Read the case study "Southwest Airlines," found in Part 2 of your textbook. Review the "Guide to Case Analysis" found on pp. CA1 - CA11 of your textbook. (This guide follows the last case in the...
-
The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...
-
In Exercises 7683, use a graphing utility to graph the function. Use the graph to determine whether the function has an inverse that is a function (that is, whether the function is one-to-one). f(x)...
-
A long, current-carrying solenoid with an air core has 1750 turns per meter of length and a radius of 0.0180 m. A coil of 125 turns is wrapped tightly around the outside of the solenoid, so it has...
-
How would you define the profit (or loss) earned by a business during an accounting period?
-
In 20X6 Alpha AS made the decision to close a loss-making department in 20X7. The company proposed to make a provision for the future costs of termination in the 20X6 income statement. Its argument...
-
What are the most significant differences among structured, object-oriented, and agile methods? What do they have in common?
-
5. Find the measure of the indicated angle to the nearest degree. a) 64 b) 66 45 c) 24 d) 19 101
-
REI sells snowboards. Assume the following information relates to REI's purchases of snowboards during September. During the same month, 102 snowboards were sold. REI uses a periodic inventory...
-
For this topic's assignment, you will create a PowerPoint presentation that discusses your cultural background and your own related experiences. When discussing your culture, consider components of...
-
You have been appointed to establish an internal audit function in a large national organization with 15,000 employees in various locations. The company is engaged in manufacturing and marketing...
-
Describe six factors you would consider in determining the frequency and extent of the audit coverage of current assets.
-
A company writes a check to replenish a \($100\) petty cash fund when the fund contains receipts of \($94\) and \($3\) in cash. In recording the check, the company should: a. debit Cash Over and...
-
Describe the relationship between quartiles and percentiles.
-
A students grade on the Fundamentals of Engineering exam has a z-score of -0.5. Make an observation about the students grade.
-
In addition to studying the formal qualities or aesthetics of artworks, we also consider them living documents or containers of meaning within their contexts.. In addition to studying the formal...
-
What tools are available to help shoppers compare prices, features, and values and check other shoppers opinions?
-
Assume that a partnership is profitable and that its tax year ends on December 31 but one of the partners' tax year ends on September 30. Does the partner enjoy a tax benefit or detriment from the...
-
For your state and one of its neighbors. find the following income tax rules. Place your data in a chart, and e-mail your findings to your instructor. a. To what extent does each state follow the...
-
What are the similarities between the crop method used for farming and the completed contract method used for long-term construction?
-
Assume that interest rate parity exists and it will continue to exist in the future. Assume that interest rates of the United States and the United Kingdom vary substantially in many periods. You...
-
Assume that interest rate parity exists and it will continue to exist in the future. Kentucky Co. wants to forecast the value of the Japanese yen in 1 month. The Japanese interest rate is lower than...
-
Assume that interest rate parity exists. Today the 1-year U.S. interest rate is equal to 8 percent, while Mexicos 1-year interest rate is equal to 10 percent. Today the 2- year annualized U.S....
Study smarter with the SolutionInn App