Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You are provided with a dataset containing some medical history information for 7 5 0 patients that might be at risk of cancer. Each patient

You are provided with a dataset containing some medical history information for 750 patients that might be at risk of cancer. Each patient in our dataset has been biopsied to obtain a direct ground truth label so we know each patient's actual cancer status (binary variable, 1 means has cancer, 0 means does not). We want to build classifiers to predict whether a patient likely has cancer from easier-to-get information, so we could avoid painful biopsies unless they are necessary.
It is known that older patients with a family history of cancer have a higher probability of harboring cancer. So we can use Age and Fam_history variables in the data as inputs to predict cancer status. A clinical chemist has recently discovered a real-valued biomarker (called Marker in the data file) that she believes can distinguish between patients with and without cancer. We wish to assess whether or not the new marker does indeed identify patients with and without cancer well.
You are tasked to build and assess the performance of the following classification models:
1- Decision Tree with maximum depth of 3
2- Decision Tree with maximum depth of 5
3- Naive Bayes
4- K-Nearest Neighbors with 5 neighbors
5- K-Nearest Neighbors with 10 neighbors
6- Support Vector Machines with polynomial kernel
7- Support Vector Machines with radial basis function kernel
You need to build the above modes for each of the following two cases. For each case-model combination, report the model accuracy. Use a test size of 25% and set the random state for data splitting to 777. Report your answers in the table below.
1- Case 1: Predict Cancer status through Age and Family history only.
2- Case 2: Predict Cancer status through Age, Family history and the biomarker.
Model Model Accuracy
Case 1: Age & Fam_history Case 2: Age, Fam_history & Marker
1- Decision Tree with maximum depth of 3
2- Decision Tree with maximum depth of 5
3- Naive Bayes
4- K-Nearest Neighbors with 5 neighbors
5- K-Nearest Neighbors with 10 neighbors
6- Support Vector Machines with polynomial kernel
7- Support Vector Machines with radial basis function kernel
Based on your analysis above, do you think the variable Marker is important in predicting the cancer status? Why?
You are provided with a dataset containing some medical history information for 750 patients that
might be at risk of cancer. Each patient in our dataset has been biopsied to obtain a direct ground
truth label so we know each patient's actual cancer status (binary variable, 1 means has cancer, 0
means does not). We want to build classifiers to predict whether a patient likely has cancer from
easier-to-get information, so we could avoid painful biopsies unless they are necessary.
It is known that older patients with a family history of cancer have a higher probability of harboring
cancer. So we can use Age and Fam_history variables in the data as inputs to predict cancer status.
A clinical chemist has recently discovered a real-valued biomarker (called Marker in the data file)
that she believes can distinguish between patients with and without cancer. We wish to assess
whether or not the new marker does indeed identify patients with and without cancer well.
You are tasked to build and assess the performance of the following classification models:
1- Decision Tree with maximum depth of 3
2- Decision Tree with maximum depth of 5
3- Naive Bayes
4- K-Nearest Neighbors with 5 neighbors
5- K-Nearest Neighbors with 10 neighbors
6- Support Vector Machines with polynomial kernel
7- Support Vector Machines with radial basis function kernel
You need to build the above modes for each of the following two cases. For each case-model
combination, report the model accuracy. Use a test size of 25% and set the random state for data
splitting to 777. Report your answers in the table below.
1- Case 1: Predict Cancer status through Age and Family history only.
2- Case 2: Predict Cancer status through Age, Family history and the biomarker.\table[[Model,Model Accuracy],[Case 1: Age & Fam_history,\table[[],[Marker]]],[\table[[1- Decision Tree with],[maximum depth of 3]],,],[\table[[2- Decision Tree with],[maximum depth of 5]],,],[3- Naive Bayes,,],[\table[[4- K-Nearest Neighbors],[with 5 neighbors]],,],[\table[[5- K-Nearest Neighbors],[with 10 neighbors]],,],[\table[[6- Support Vector],[Machines with],[polynomial kernel]],,],[\table[[7- Support Vector],[Machines with radial],[basis function kernel]],,]]
Based on your analysis above, do you think the variable Marker is important in predicting the cancer status? Why?\table[[Model,Model Accuracy],[Case 1: Age & Fam_history,\table[[],[Marker]]],[\table[[1- Decision Tree with],[maximum depth of 3]],,],[\table[[2- Decision Tree with],[maximum depth of 5]],,],[3- Naive Bayes,,],[\table[[4- K-Nearest Neighbors],[with 5 neighbors]],,],[\table[[5- K-Nearest Neighbors],[with 10 neighbors]],,],[\table[[6- Support Vector],[Machines
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Computer Aided Database Design

Authors: Antonio Albano, Valeria De Antonellis, A. Di Leva

1st Edition

0444877355, 978-0444877352

More Books

Students also viewed these Databases questions