Question
Hi, could you help to identify and fix my mistakes? I try to perform cross-validation but have no idea how answer to questions correctly. 1.
Hi, could you help to identify and fix my mistakes? I try to perform cross-validation but have no idea how answer to questions correctly.
1. Consider the following model: diabetic status (*Diabetes*; 0=non-diabetic, 1=diabetic) as a function of age (*age*; years), weight (*weight*; kg), hypertension status (*HTN*; 0=normotensive, 1=hypertensive), health status as indicated by high density lipoproteins (*hdl3cat*; 0=poor health, 1=intermediate health, 2=ideal health), the interaction between weight and hypertension status, the interaction between weight and age, and the interaction between weight and health status as indicated by high density lipoproteins.
# Load necessary libraries
library(fastDummies)
library(boot)
# Creating data set
data2 <- as_tibble(data %>% dplyr::select(Diabetes, age, weight, HTN, hdl3cat) %>% na.omit())
# Creating Dummy Variables
data2 <- dummy_cols(data2, select_columns = "Diabetes")
data2 <- dummy_cols(data2, select_columns = "HTN")
data2 <- dummy_cols(data2, select_columns = "hdl3cat")
colnames(data2)
# Create the model for diabetes
m3 <- glm(Diabetes ~ age + weight + as.factor(HTN) + as.factor(hdl3cat) +
HTN_1:weight + as.factor(hdl3cat):weight + weight:age,
data = data2, family = "binomial")
2. Perform 2-fold cross-validation.
# Perform k-fold cross-validation (2-fold)
cv_error <- cv.glm(data2, m3, K = 2)
cv_error$delta
Result: 0.1303832 0.1290684
Perform 5-fold cross-validation.
Result: 0.1274221 0.1273104
Perform 10-fold cross-validation.
Result: 0.1277733 0.1276991
Perform 25-fold cross-validation.
Result: 0.1276396 0.1276131
Perform 50-fold cross-validation.
Result: 0.1276981 0.1276844
Perform 100-fold cross-validation.
Result: 0.1277033 0.1276967
3. What did you observe, if anything, about the CV values?
CV(n) values for k-fold cross-validation for our model are the almost same for k=2; 5; 10; 25; 50; and 100. The difference in numbers is not significant.
4. What did you observe, if anything, about the processing time?
The processing time was pretty fast.
What can I say else?
A tibble: 2,455 x 12 Diabetes age weight HTN hdl3cat Diabetes_0 Diabetes 1 HTN_O HTN_1 hdl3cat 0 0 63.44422 73.4 0 1 1 0 1 0 0 100OO 56.03012 56.48734 115.1 1 0 0 1 0 1 1 84.0 1 2 1 0 1 0 39.97536 49.1 0 0 1 0 1 0 1 47.68241 107.5 0 0 1 0 1 0 1 68.11499 106.1 0 0 1 0 1 0 1 0 46.82546 94.4 0 0 1 0 1 0 1 45.08966 104.0 0 1 1 0 1 1 54.77070 87.1 1 2 0 1 0 1 0 1 53.33607 128.4 1 0 0 1 0 1 1 1-10 of 2,455 rows | 1-10 of 12 columns Previous 1 2 3 4 5 6 100 Next
Step by Step Solution
There are 3 Steps involved in it
Step: 1
It seems that you are trying to perform crossvalidation for a logistic regression model to assess it...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started