Evaluation of an imputation method for missing data. When analyzing big data (large data sets with many

Question:

Evaluation of an imputation method for missing data. When analyzing big data (large data sets with many variables), business researchers often encounter the problem of missing data (e.g., non-response). Typically, an imputation method will be used to substitute in reasonable values (e.g., the mean of the variable) for the missing data. An imputation method that uses “nearest neighbors” as substitutes for the missing data was evaluated in Data & Knowledge Engineering (March 2013). Two quantitative assessment measures of the imputation algorithm are normalized root mean square error (NRMSE) and classification bias. The researchers applied the imputation method to a sample of 3,600 data sets with missing values and determined the NRMSE and classification bias for each data set. The correlation coefficient between the two variables was reported as r = .2838.

a. Conduct a test to determine if the true population correlation coefficient relating NRMSE and bias is positive. Interpret this result practically.

b. A scatterplot for the data (extracted from the journal article) is shown in the next column. Based on the graph, would you recommend using NRMSE as a linear predictor of bias? Explain why your answer does not contradict the result in part a.

Step by Step Answer:

Related Book For  book-img-for-question

Statistics For Business And Economics

ISBN: 9781292413396

14th Global Edition

Authors: James McClave, P. Benson, Terry Sincich

Question Posted: