Automobile Accidents The file Accidents jmp contains information on 42,183 actual automobile accidents in 2001 in the United States that involved one of three levels of injury NO INJURY, INJURY, or FATALLY For each accident, additional information is recorded, such as day of week, weather conditions, and road type A firm might be interested in developing a system for quickly classifying the severity of an accident based on initial reports and associated data in the system (some of which rely on GPS assisted reporting) Our goal here is to predict whether an accident just reported will involve an injury (MAX SEV IR 1 or 2) or will not (MAX SEV IR 0) For this purpose, use Recode to create a new variable called INJURY that takes the value yes if MAX SEV IR 1 or 2, and otherwise no (See Chapter 3 for information on using Recode ) Using the information in this dataset, if an accident has just been reported and no further information is available, what should the prediction be (INJURY Yes or NO ) Why Create a subset of the first 12 records in the dataset and look only at the response (INJURY) and the two predictors WEATHER R and TRAF CON R Create a tabular summary that examines INJURY as a function of the two predictors for these 12 records Compute the exact Bayes conditional probabilities of an injury (INJURY yes) given the six possible combinations of the predictors First, check to make sure that the variables are coded as Nominal, and that none of the variable has the Value Labels column property (remove this column property if needed) Classify the 12 accidents using these probabilities and a cutoff of 0 5 Compute manually the nave Bayes conditional probability of an injury given WATHER R 1 and TRAF CON R 1 Run a Nave Bayes classifier on the 12 records and two predictors using the JMP Nave Bayes add in Look at the data table for probabilities and classifications for all 12 records (these display at the far right on the data table) Compare this to the exact Bayes classification Are the resulting classifications equivalent Is the ranking ( ordering) of observations equivalent Now return to entire dataset What proportion of accident led to injuries Assuming that no information or initial reports about the accident itself are available at the time of prediction (only location characteristics, weather conditions, etc ), which predictors can we include in the analysis (Hold your mouse over the column names in Columns Panel in the data table for descriptions of the predictors ) Prepare the data First, check to make sure that the relevant variables are coded as Nominal, and that none of the variables has the Value Labels column property For continuous predictors (i e, SPD LIM), consider using Recode to bin the data into buckets (then apply the Nominal modeling type) Run a nave Bayes classifier on the data set with the relevant predictors (and INJURY as the response) This may take a minute or two if you select several predictors Using the interactive dialog, explore how the probabilities change for different conditions Which set of conditions results in the highest probability of an injury

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 10, 2024

Automobile Accidents. The file Accidents.jmp contains information on 42,183 actual automobile accidents in 2001 in the United States that involved one of three levels of

Automobile Accidents. The file Accidents.jmp contains information on 42,183 actual automobile accidents in 2001 in the United States that involved one of three levels of injury: NO INJURY, INJURY, or FATALLY. For each accident, additional information is recorded, such as day of week, weather conditions, and road type. A firm might be interested in developing a system for quickly classifying the severity of an accident based on initial reports and associated data in the system (some of which rely on GPS-assisted reporting).

Our goal here is to predict whether an accident just reported will involve an injury (MAX_SEV_IR =1 or 2) or will not (MAX_SEV_IR=0). For this purpose, use Recode to create a new variable called INJURY that takes the value yes if MAX_SEV_IR = 1 or 2, and otherwise no (See Chapter 3 for information on using Recode.)

Using the information in this dataset, if an accident has just been reported and no further information is available, what should the prediction be? (INJURY = Yes or NO?) Why?
Create a subset of the first 12 records in the dataset and look only at the response (INJURY) and the two predictors WEATHER_ R and TRAF_CON_R.

Create a tabular summary that examines INJURY as a function of the two predictors for these 12 records.
Compute the exact Bayes conditional probabilities of an injury (INJURY= yes) given the six possible combinations of the predictors. First, check to make sure that the variables are coded as Nominal, and that none of the variable has the Value Labels column property (remove this column property if needed).
Classify the 12 accidents using these probabilities and a cutoff of 0.5.
Compute manually the nave Bayes conditional probability of an injury given WATHER_R= 1 and TRAF_CON_R=1.
Run a Nave Bayes classifier on the 12 records and two predictors using the JMP Nave Bayes add-in. Look at the data table for probabilities and classifications for all 12 records (these display at the far right on the data table). Compare this to the exact Bayes classification. Are the resulting classifications equivalent? Is the ranking (=ordering) of observations equivalent?

Now return to entire dataset.

What proportion of accident led to injuries?
Assuming that no information or initial reports about the accident itself are available at the time of prediction (only location characteristics, weather conditions, etc.), which predictors can we include in the analysis? (Hold your mouse over the column names in Columns Panel in the data table for descriptions of the predictors.)
Prepare the data. First, check to make sure that the relevant variables are coded as Nominal, and that none of the variables has the Value Labels column property. For continuous predictors (i.e, SPD_LIM), consider using Recode to bin the data into buckets (then apply the Nominal modeling type).
Run a nave Bayes classifier on the data set with the relevant predictors (and INJURY as the response). This may take a minute or two if you select several predictors.
Using the interactive dialog, explore how the probabilities change for different conditions. Which set of conditions results in the highest probability of an injury?