Question

1 Approved Answer

Posted on Sep 25, 2024

For the project, there are two parts: A and B. You should use a total of three files. Two of the files are for part

For the project, there are two parts: A and B. You should use a total of three files. Two of the files are for part A, and one file is for part B. Please download three files here:https://1drv.ms/f/s!An6yRjtqJUs5cGl4YmWoZ1MUDss

For part A, each file will contain a column for subject ID and a column for either the dependent variable value or the independent variable value. First, you are expected to sort the two files by subject ID and merge them. You should not just use cut and paste to merge your data. Second, you are expected to deal with missing data. Your report should contain the count of the number of subject IDs that had at least one independent variable value or dependent variable value. It should also include the count of the number of subject IDs that had an independent variable. There are a number of missing data procedures. Often a statistical package has imputation algorithms in the software. For example, R has 5 different algorithms available. You may choose any algorithm except for listwise deletion. Specify your choice in your report. Often, the choice of imputation method has little effect on the results if the fraction of missing data is 30% or less. Then, you should use the statistical package of your choice to find the fitted linear model.

The data file for part B will contain one line for each subject ID. The line will contain the subject ID, the value of the independent variable, and the value of the dependent variable. A transformation of either IV or DV or both may be required. You should read the text for suggestions on fitting a model. A lack of fit (LOF) test should be applied if there are repeated values in the data sets. It is your groups responsibility to find repeated (or near repeated) independent variable values. That is, your group should bin near repeated data into one level. For example, suppose that and . While there are not exactly repeated x values, your group could bin these points into one group of nearly repeated points. That is, choose the average x-value as the value of x after binning. Then your binned data would be and . Then perform a LOF test on the data set after binning all near repeated values.

You must submit a one-page report on Problem A and a one-page report on Problem B. Each report should have four sections. The introduction should contain a statement of the problem and the objective of the paper. This part is easy: your problem is to recover the function that was used to generate the dependent variable value based on the value of the independent variable. The data you receive will be generated by a simulation program. The second section should describe your methodology. Specifically, how the files were merged, the program used to perform the statistical analysis, whether you used linear regression and additional procedures such as a lack of fit test, how much missing data was present in the data, and the procedure for dealing with missing data. The third section should contain your results: what fraction of the variation of the dependent variable was explained, the analysis of variance table, the fitted function, confidence intervals for slope and test of the null hypothesis that the slope was zero. The fourth section should be conclusions and discussion. This section should focus on big picture issues. Was there an association between the variables? How important was it? That is, what was the r-squared value. What is your fitted function? You may submit a longer appendix of computer work and programs.

Important note:

Simply submitting your computer output is not acceptable and will receive a grade of 0. You must submit a formal report to begin to get non-zero credit.