Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The following code uses cross-validation in order to estimate predictive accuracy for a linear model of days-to-remission as a function of gene expression in ALL

The following code uses cross-validation in order to estimate predictive accuracy for a linear model of days-to-remission as a function of gene expression in ALL dataset. It runs to completion without errors but produces a number of warnings (shown below) about "differing numbers of rows" and "mismatches in object lengths."

Please explain the source of those warnings and how they can be cleaned up. Please also explain how whatever caused those warnings affects the output (if at all), and how and why (and if) the output changes upon fixing the code. (Hint: in order to observe these warnings you do not have to go through all 12K genes at each step of cross-validation - one percent of that amount is plenty - and it will save you a lot of time you would otherwise waste watching it run; remember also that R is an interpreter, so you can run commands one at a time if you need to and examine their outputs).

library(ALL) data(ALL) 
set.seed(1234) 
# calculate days-to-remission: 
ALL.pdat <- pData(ALL) date.cr.chr <- as.character(ALL.pdat$date.cr) diag.chr <- as.character(ALL.pdat$diagnosis) date.cr.t <- strptime(date.cr.chr,"%m/%d/%Y") diag.t <- strptime(diag.chr,"%m/%d/%Y") ALL.pdat$D2R <- as.numeric(date.cr.t - diag.t) 
# prepare the data structures: 
ALL.exprs <- exprs(ALL)[,!is.na(ALL.pdat$D2R)] ALL.pdat <- ALL.pdat[!is.na(ALL.pdat$D2R),] n.xval <- 5 s2.xval <- numeric() 
xval.grps <- sample(1:dim(ALL.pdat)[1]%%n.xval+1) 
# run over each cross-validation: 
for ( i.xval in 1:n.xval ) { min.pval <- 1.0 
 min.id <- NA train.exprs <- ALL.exprs[,xval.grps!=i.xval] train.d2r <- ALL.pdat[xval.grps!=i.xval,"D2R"]  # evaluate each gene in the training dataset to find the one # most associated with the outcome:  for( i in 1:dim(train.exprs)[1]) {  ###for( i in 1:100 ) { 
 p.val <- anova(lm(train.d2r~train.exprs[i,],))[1,"Pr(>F)"] if ( p.val < min.pval ) { 
 min.pval <- p.val 

min.id <- i }

}

 # print the gene found: 
 cat(rownames(train.exprs)[min.id],min.pval,fill=T) 
 # refit the model for best gene found on training dataset: 
 best.lm.xval <- lm(train.d2r~train.exprs[min.id,]) 
 # calculate predictions on test dataset: 
 test.exprs <- ALL.exprs[,xval.grps==i.xval] test.d2r <- ALL.pdat[xval.grps==i.xval,"D2R"] test.pred <- predict( 
 best.lm.xval,data.frame(t(test.exprs),test.d2r) ) 
 # accumulate squared errors of prediction: 
 s2.xval <- c(s2.xval,(test.pred-test.d2r)^2) } 
40176_at 1.433363e-05 35296_at 8.721938e-07 1213_at 3.760985e-06 34852_g_at 2.161217e-06 33901_at 1.399374e-06 Warning messages: 
1: 'newdata' had 19 rows but variables found have 77 rows 2: In test.pred - test.d2r : 
 longer object length is not a multiple of shorter object length ... 
# print average squared error in cross-validation: 
 mean(s2.xval) 

[1] 332.7707

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Information And Database Systems 6th Asian Conference Aciids 2014 Bangkok Thailand April 7 9 2014 Proceedings Part I 9 2014 Proceedings Part 1 Lnai 8397

Authors: Ngoc-Thanh Nguyen ,Boonwat Attachoo ,Bogdan Trawinski ,Kulwadee Somboonviwat

2014th Edition

3319054759, 978-3319054759

More Books

Students also viewed these Databases questions

Question

How did you feel about taking piano lessons as a child? (general)

Answered: 1 week ago

Question

5. What information would the team members need?

Answered: 1 week ago

Question

Where those not participating, encouraged to participate?

Answered: 1 week ago