Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 09, 2024

You are provided with a dataset containing some medical history information for 7 5 0 patients that might be at risk of cancer. Each patient

You are provided with a dataset containing some medical history information for

750

patients that might be at risk of cancer. Each patient in our dataset has been biopsied to obtain a direct ground truth label so we know each patient's actual cancer status

(

binary variable,

1

means has cancer,

0

means does not

) .

We want to build classifiers to predict whether a patient likely has cancer from easier

-

-

get information, so we could avoid painful biopsies unless they are necessary.

It is known that older patients with a family history of cancer have a higher probability of harboring cancer. So we can use Age and Fam

_

history variables in the data as inputs to predict cancer status. A clinical chemist has recently discovered a real

-

valued biomarker

(

called Marker in the data file

)

that she believes can distinguish between patients with and without cancer. We wish to assess whether or not the new marker does indeed identify patients with and without cancer well.

You are tasked to build and assess the performance of the following classification models:

1 -

Decision Tree with maximum depth of

3

2 -

Decision Tree with maximum depth of

5

3 -

Naive Bayes

4 -

-

Nearest Neighbors with

5

neighbors

5 -

-

Nearest Neighbors with

10

neighbors

6 -

Support Vector Machines with polynomial kernel

7 -

Support Vector Machines with radial basis function kernel

You need to build the above modes for each of the following two cases. For each case

-

model combination, report the model accuracy. Use a test size of

25 %

and set the random state for data splitting to

777 .

Report your answers in the table below.

1 -

Case

1

: Predict Cancer status through Age and Family history only.

2 -

Case

2

: Predict Cancer status through Age, Family history and the biomarker.

Model Model Accuracy

Case

1

: Age & Fam

_

history Case

2

: Age, Fam

_

history & Marker

1 -

Decision Tree with maximum depth of

3

2 -

Decision Tree with maximum depth of

5

3 -

Naive Bayes

4 -

-

Nearest Neighbors with

5

neighbors

5 -

-

Nearest Neighbors with

10

neighbors

6 -

Support Vector Machines with polynomial kernel

7 -

Support Vector Machines with radial basis function kernel

Based on your analysis above, do you think the variable Marker is important in predicting the cancer status? Why?

You are provided with a dataset containing some medical history information for

750

patients that

might be at risk of cancer. Each patient in our dataset has been biopsied to obtain a direct ground

truth label so we know each patient's actual cancer status

(

binary variable,

1

means has cancer,

0

means does not

) .

We want to build classifiers to predict whether a patient likely has cancer from

easier

-

-

get information, so we could avoid painful biopsies unless they are necessary.

It is known that older patients with a family history of cancer have a higher probability of harboring

cancer. So we can use Age and Fam

_

history variables in the data as inputs to predict cancer status.

A clinical chemist has recently discovered a real

-

valued biomarker

(

called Marker in the data file

)

that she believes can distinguish between patients with and without cancer. We wish to assess

whether or not the new marker does indeed identify patients with and without cancer well.

You are tasked to build and assess the performance of the following classification models:

1 -

Decision Tree with maximum depth of

3

2 -

Decision Tree with maximum depth of

5

3 -

Naive Bayes

4 -

-

Nearest Neighbors with

5

neighbors

5 -

-

Nearest Neighbors with

10

neighbors

6 -

Support Vector Machines with polynomial kernel

7 -

Support Vector Machines with radial basis function kernel

You need to build the above modes for each of the following two cases. For each case

-

model

combination, report the model accuracy. Use a test size of

25 %

and set the random state for data

splitting to

777 .

Report your answers in the table below.

1 -

Case

1

: Predict Cancer status through Age and Family history only.

2 -

Case

2

: Predict Cancer status through Age, Family history and the biomarker.

\

table

[[

Model

,

Model Accuracy

], [

Case

1

: Age & Fam

_

history,

\

table

[[], [

Marker

]]], [\

table

[[1 -

Decision Tree with

], [

maximum depth of

3]],,], [\

table

[[2 -

Decision Tree with

], [

maximum depth of

5]],,], [3 -

Naive Bayes,,

], [\

table

[[4 -

-

Nearest Neighbors

], [

with

5

neighbors

]],,], [\

table

[[5 -

-

Nearest Neighbors

], [

with

10

neighbors

]],,], [\

table

[[6 -

Support Vector

], [

Machines with

], [

polynomial kernel

]],,], [\

table

[[7 -

Support Vector

], [

Machines with radial

], [

basis function kernel

]],,]]

Based on your analysis above, do you think the variable Marker is important in predicting the cancer status? Why?

\

table

[[

Model

,

Model Accuracy

], [

Case

1

: Age & Fam

_

history,

\

table

[[], [

Marker

]]], [\

table

[[1 -

Decision Tree with

], [

maximum depth of

3]],,], [\

table

[[2 -

Decision Tree with

], [

maximum depth of

5]],,], [3 -

Naive Bayes,,

], [\

table

[[4 -

-

Nearest Neighbors

], [

with

5

neighbors

]],,], [\

table

[[5 -

-

Nearest Neighbors

], [

with

10

neighbors

]],,], [\

table

[[6 -

Support Vector

], [

Machines

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Computer Aided Database Design

Authors: Antonio Albano, Valeria De Antonellis, A. Di Leva

1st Edition

★★★★★

Compose a variety of follow-up letters and other employment related messages.

Answered: 1 week ago

Previous Question Next Question