Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 13, 2024

Background Cirrhosis results from prolonged liver damage, leading to extensive scarring, often due to conditions like hepatitis or chronic alcohol consumption. The data provided is

Background

Cirrhosis results from prolonged liver damage, leading to extensive scarring, often due to conditions like hepatitis or chronic alcohol consumption. The data provided is a subset sourced from a Mayo Clinic study on primary biliary cirrhosis

(

PBC

)

of the liver carried out from

1974

1984 .

This is a dataset to develop and validate machine learning algorithms for predicting the survival status of the collected patients. There are

312

patients in the data set

(224

for train and

88

for test

),

and each patient has

17

collected features. The aim of this task is to utilize

17

clinical features for predicting survival state of patients with liver cirrhosis. The survival states include

0 =

(

death

), 1 =

(

censored

), 2 =

(

censored due to liver transplantation

)

Specifically, the problem you are going to solve is: Can you

Accurately predict the survival status given the labelled data?

Well explain your prediction and the associated findings? For example, identify the key factors which are strongly associated with the response variable, i

.

.,

survival status.

Data set

The training data contains

224

rows and the test data contains

88

rows, each of which have

19

columns

(

excluding the ID column

)

: the N

_

Days attribute is the number of days between registration and the earlier of death, transplantation, or study analysis time in July

1986,

the status attribute is the target variable that we will predict, and the rest

17

columns can be used as the input features. The details of the original data set can be found and downloaded in the original UCI repository. The values of the

status

column in the test set is leaved with empty to simulate real world predictions.

Evidence of Learning:

Execute your code into a jupyter notebook

(.

ipynb file

)

and keep the output, write a report

(.

pdf file

)

to answer the following questions, and submit your code and report to OnTrack.

1 .

Load and explore the training and test dataset, do necessary pre

-

processing.

.

Show both training and test dataset size.

.

Based on the training and test data, show the feature types, and indicate which feature has missing values.

.

Use an appropriate method to deal with the missing values for both the training and test set.

.

Do necessary encoding for the categorical features.

.

Show the label distribution based on the training data, is it a balanced training set?

2 .

Based on the pre

-

processed training data from question

1,

create three supervised machine learning

(

)

models for predicting

Status

.

.

Use an appropriate validation method, report performance score using a suitable metric. Is it possible that the presented result is an underfitted or overfitted one? Justify.

.

Justify different design decisions for each ML model used to answer this question.

.

Have you optimised any hyper

-

parameters for each ML model? What are they? Why have you done that? Explain.

.

What can you do with the label imbalance issue?

.

Finally, make a model recommendation based on the reported results and justify it

Use the best model that you get from question

2,

do prediction on the pre

-

processed test set. Save your prediction

(

the prediction should contain two columns only: testID and Status

),

and submit it to the specific Kaggle in

-

class platform, do a screenshot of your model performance and report it

.

Please answer all of question

1,

I'll then use it for question

2

and

3

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Information And Database Systems Asian Conference Aciids 2012 Kaohsiung Taiwan March 19 21 2012 Proceedings Part 3 Lnai 7198

Authors: Jeng-Shyang Pan ,Shyi-Ming Chen ,Ngoc-Thanh Nguyen

2012th Edition

★★★★★

Assess the implications of putting a policy into practice and evaluating the success of initiatives in this area.

Answered: 1 week ago

Previous Question Next Question