Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 27, 2024

Missing feature values need to be addressed prior to the model development phase of the CRISP - DM methodology to avoid training on incomplete data.

Missing feature values need to be addressed prior to the model development phase of the CRISP

-

DM methodology to avoid training on incomplete data. This task assesses your ability to navigate the complexities of constructing a data pipeline to transform partially erroneous raw data to knowledge and evaluate the effectiveness of the proposed solution. The task will require you to identify a suitable publicly available dataset

/

,

for which there is previous research that addresses the missing feature value problem. Propose an approach to use the Na

ve Bayes classifier to address the missing feature value problem in the context of categorical feature values. Implement this approach on a publicly available dataset

/

s and report its performance in the context of a classification problem. Compare the effectiveness of this approach against a baseline imputation approach that uses the mode value, on two different machine learning models. Justify and discuss your findings using appropriate metrics and relate your findings to previous research on data imputation using the same dataset

/

.

Present your findings in a written report and a video presentation. State and motivate any assumptions or scope adopted during the task. Based on this question....Please write a two page report using these this literature. Use in text citation and also pull references. Chronic kidney disease

(

CKD

)

is a major burden on the healthcare system because of its increasing prevalence, high risk of progression to end

-

stage renal disease, and poor morbidity and mortality prognosis. It is rapidly becoming a global health crisis. Unhealthy dietary habits and insufficient water consumption are significant contributors to this disease. Without kidneys, a person can only live for

18

days on average, requiring kidney transplantation and dialysis. It is critical to have reliable techniques at predicting CKD in its early stages. Machine learning

(

)

techniques are excellent in predicting CKD

.

The current study offers a methodology for predicting CKD status using clinical data, which incorporates data preprocessing, a technique for managing missing values, data aggregation, and feature extraction. A number of physiological variables, as well as ML techniques such as logistic regression

(

),

decision tree

(

)

classification, and K

-

nearest neighbor

(

KNN

),

were used in this work to train three distinct models for reliable prediction. The LR classification method was found to be the most accurate in this role, with an accuracy of about

97

percent in this study. The dataset that was used in the creation of the technique was the CKD dataset, which was made available to the public. Compared to prior research, the accuracy rate of the models employed in this study is considerably greater, implying that they are more trustworthy than the models used in previous studies as well. A large number of model comparisons have shown their resilience, and the scheme may be inferred from the study's results.

Go to:

1 .

Introduction

Chronic kidney disease

(

CKD

)

is a major public health concern around the world, with negative outcomes such as renal failure, cardiovascular disease, and early death

[1] .

According to a

2010

study by the Global Burden of Disease Study

(

GBDS

),

chronic kidney disease

(

CKD

)

was listed as the

18

th leading cause of mortality worldwide, up from

27

th in

1990 [2] .

Chronic kidney disease affects over

500

million people worldwide

[3, 4],

with a disproportionately high burden in developing countries, particularly South Asia and sub

-

Saharan Africa

[5] .

According to a

2015

study, there were

110

million people with CKD in high

-

income nations

(

men

48.3

million, women

61.7

million

),

but

387.5

million in low

-

and middle

-

income countries

[6] .

Bangladesh is a densely populated developing country in Southeast Asia where chronic kidney disease is on the rise year after year. The overall population of CKD is estimated to be

14

percent in a global study of six areas, including Bangladesh

[7] .

Another study discovered a

26 %

prevalence of chronic kidney disease among urban Dhaka residents over

30

years old

[8],

while another researcher discovered a

13 %

prevalence of chronic kidney disease among urban Dhaka residents over

15

years old

[9] .

2013,

a community

-

based prevalence study in Bangladesh revealed that one

-

third of rural residents were at risk of developing CKD

,

which was generally misdiagnosed at the time

[10] .

The observed variations in CKD prevalence between Bangladeshi groups, on the other hand, could be explained by a number of factors, including the cross

-

sectional research design with a small sample size, the study period, and the geographic distribution of urban and rural areas. According to one study, the prevalence of CKD varies by age group, gender, socioeconomic status, and geographic region