Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

https://www3.nd.edu/~busiforc/problems/DataMining/Accidents.xls you can access data from this address. The file Accidents.xls contains information on 42,183 actual automobile accidents in 2001 in the US that involved

image text in transcribed

image text in transcribed

https://www3.nd.edu/~busiforc/problems/DataMining/Accidents.xls you can access data from this address.

The file Accidents.xls contains information on 42,183 actual automobile accidents in 2001 in the US that involved one of three levels of injury: no injury, injury", or "fatality. For each accident, additional information is recorded such as day of week, weather conditions, and road type. A firm might be interested in developing a system for quickly classifying the severity of an accident, based upon initial reports and associated data in the system (some of which rely on GPS-assisted reporting). Our goal here is to predict whether a new accident just reported will involve an injury (MAX_SEV_IR=1 or 2) or not (MAX_SEV_IR=0). For this purpose, create a new variable called INJURY that takes the value l" that means "with injury" if MAX_SEV_IR=1 or 2, and otherwise the value is O refereeing "no injury". Partition the data into training (60%) and validation (40%) sets. a) Compute the accuracy rate of each class for the validation set based on the Nave Rule. You can present the accuracy rates using a matrix (called confusion matrix) as show in the example below: Predicted Class-1 Class-2 10 3 Class-1 Actual Class-2 2 12 Here 10 out of 13 data points in Class-1 are correctly predicted and 12 out of 14 data points in Class-2 are correctly predicted. Overall accuracy rate is 22/27 = 82%. Accuracy rate in Class-1 = 10/13 = 77%, accuracy rate in Class-2 is 12/14=86% b) Assume that no information or initial reports about the accident itself are available at the time of prediction (only location characteristics, weather conditions, road conditions etc.) which predictors can we include in the analysis? (Please read the Data_Codes sheet). c) Run a Nave Bayes classifier on the complete training set by choosing the relevant predictors (continue from part-b), use INJURY as the response variable. Notice that all predictors are categorical. Show the classification matrix (confusion matrix) for the training and validation data. d) Is there any percent improvement relative to the Nave Rule? e) Run a Nave Bayes classifier using all predictors and INJURY as the response variable. Report again your error rates in both training and validation set with using confusion matrix. 1) Which analysis in part-b or in part-e would be appropriate if you consider applying Nave Bayes model that you created for the future accidents? Please explain your reasoning. g) Run a Nave Bayes classifier with the variables in part-b and response variable INJURY after partitioning the data into training (60%) and validation (40%) sets. Is there any affect of different partitioning on the accuracy results? If you observe a chance, please explain the possible reason. Note : I have posted a guideline for the usage of Naive Bayes in XLMiner. It might be helpful while using XL Miner. The file Accidents.xls contains information on 42,183 actual automobile accidents in 2001 in the US that involved one of three levels of injury: no injury, injury", or "fatality. For each accident, additional information is recorded such as day of week, weather conditions, and road type. A firm might be interested in developing a system for quickly classifying the severity of an accident, based upon initial reports and associated data in the system (some of which rely on GPS-assisted reporting). Our goal here is to predict whether a new accident just reported will involve an injury (MAX_SEV_IR=1 or 2) or not (MAX_SEV_IR=0). For this purpose, create a new variable called INJURY that takes the value l" that means "with injury" if MAX_SEV_IR=1 or 2, and otherwise the value is O refereeing "no injury". Partition the data into training (60%) and validation (40%) sets. a) Compute the accuracy rate of each class for the validation set based on the Nave Rule. You can present the accuracy rates using a matrix (called confusion matrix) as show in the example below: Predicted Class-1 Class-2 10 3 Class-1 Actual Class-2 2 12 Here 10 out of 13 data points in Class-1 are correctly predicted and 12 out of 14 data points in Class-2 are correctly predicted. Overall accuracy rate is 22/27 = 82%. Accuracy rate in Class-1 = 10/13 = 77%, accuracy rate in Class-2 is 12/14=86% b) Assume that no information or initial reports about the accident itself are available at the time of prediction (only location characteristics, weather conditions, road conditions etc.) which predictors can we include in the analysis? (Please read the Data_Codes sheet). c) Run a Nave Bayes classifier on the complete training set by choosing the relevant predictors (continue from part-b), use INJURY as the response variable. Notice that all predictors are categorical. Show the classification matrix (confusion matrix) for the training and validation data. d) Is there any percent improvement relative to the Nave Rule? e) Run a Nave Bayes classifier using all predictors and INJURY as the response variable. Report again your error rates in both training and validation set with using confusion matrix. 1) Which analysis in part-b or in part-e would be appropriate if you consider applying Nave Bayes model that you created for the future accidents? Please explain your reasoning. g) Run a Nave Bayes classifier with the variables in part-b and response variable INJURY after partitioning the data into training (60%) and validation (40%) sets. Is there any affect of different partitioning on the accuracy results? If you observe a chance, please explain the possible reason. Note : I have posted a guideline for the usage of Naive Bayes in XLMiner. It might be helpful while using XL Miner

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional IPhone And IPad Database Application Programming

Authors: Patrick Alessi

1st Edition

0470636173, 978-0470636176

More Books

Students also viewed these Databases questions

Question

Distinguish between absorption and variable costing.

Answered: 1 week ago

Question

What is the relationship between humans?

Answered: 1 week ago

Question

What is the orientation toward time?

Answered: 1 week ago