Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

R CODE MUST BE ATTACHED FOR CREDIT: Instructions for submission In this assignment we consider again the data that was presented as part of the

R CODE MUST BE ATTACHED FOR CREDIT:

Instructions for submission

In this assignment we consider again the data that was presented as part of the Assignment for Unit 2. In the current assignment we apply some of the tools that were introduced in Units 5 and 6 in order to analyze the data. The data was collected from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The current assignment involves data collected on a random sample of 748 donors. The data was obtained from theUCI Machine Learning Repository. This data was assembled by Prof. I-Cheng Yeh.

The file "transfusion.csv" contains the data. The file can be found here. The file contains 5 variables:

  • recency= The number of months since the last donation. (numeric)
  • frequency= The total number of donations. (numeric)
  • monetary= Total blood donated (in c.c.). (numeric)
  • time= The number of months since the first donation. (numeric)
  • march2007= An indicator. Indicates those that donated blood in March, 2007. (factor)

In the assignment we consider the last four variables.

Comparing Two Samples

Consider "frequency" as a response and "march2007" as an explanatory variable. Plot the relation between the two variables, test the equality of the expectation in the two sub-samples and the equality of the variance. Repeat the same analysis for the case where the response "frequency" is replaced by the log-transformed response: "log(frequency)". In Tasks 1-3 you are asked to describe the results of the analysis.

Linear Regression

In Tasks 4-7 you are asked to conduct an analysis similar to the analysis of Tasks 1-3. The difference is that the numerical variable "time" is used as the explanatory variable. The model of linear regression assumes that the expectation of the response is a linear function of the explanatory variable. Another assumption of the model is that the variance of the response is constant for each value of the explanatory variable. Frequently, however, one may observe an increase in the variance for larger values of the explanatory variable. Replacing the response by the log-transformed response is a commonly used method to overcome this difficulty. The analysis that involves the log of the response can be carried out via the replacement of the response "frequency" in the formula by the transformed response "log(frequency)".

The Relation Between Two Variables

The final Task 8 involves the investigation of the relation between the response "frequency" and the variable "monetary".

Submitting the Assignment

For the assignment you should complete the following 8 tasks. Tasks 1-3 refer to the problem ofcomparing two samplesand Tasks 4-7 refer toregression analysis. In Task 8the relation between two variablesis investigated. Your answers should be short and clear. We recommend that you copy and paste the tasks below into the form titled "Submit your Assignment using this Form". You can then write you answers to the tasks in the designated positions that are marked in the text:

Tasks

Comparing Two Samples:

1. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "march2007" in order to produce the two box-plots of the response. Redo the plotting with "frequency" replaced by "log(frequency)". The distribution of the variable "log(frequency)" is:

__ More symmetric, __ Less symmetric compared to the distribution of the variable "frequency".

Mark the most appropriate option and attach the R code that produces the two plots:

2. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The expectation of "frequency" is the same in the two subsets,

(Reject/Don't Reject) H0: The expectation of "log(frequency)" is the same in the two subsets.

Explain your answer:

3. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The variance of "frequency" is the same in the two subsets,

(Reject/Don't Reject) H0: The variance of "log(frequency)" is the same in the two subsets.

Explain your answer:

Linear Regression:

4. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "time" in order to produce the scatter plot. Add the regression line to the plot. The variability of the variable "frequency, for larger values of the explanatory variable, is:

__ Smaller, __ Larger, __ Constant.

Mark the most appropriate option and attach the R code that produces the two plots:

5. Mark the null hypotheses that you reject with a significance level of 5% and those that you do not reject:

(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response "frequency" is equal to zero,

(Reject/Don't Reject) H0: The slope of "time" in the regression line of the response "log(frequency)" is equal to zero.

Explain your answer:

6. The 95%-confidence interval of slope of "time" in the regression line of the response "log(frequency)" is:

Lower end = ____, Upper end = ____.

Attach the R code that produces the confidence interval:

7. The regression line between "time" as an explanatory variable and "log(frequency)" as a response is:

__ Increasing, __ Decreasing, __ Constant.

Mark the most appropriate option and explain your answer:

The Relation Between Two Variables:

8. Apply the function "plot" to the formula that relates the response "frequency" to the explanatory variable "monetary" in order to produce the scatter plot. Add the regression line to the plot. The points in the scatter plot are:

__ All on the same line, __ Show a linear trend but are not on the same line, __ Don't show a linear trend.

Mark the most appropriate option and attach the R code that produces the plot:

DATA FILE: Transfusion.csv

https://drive.google.com/file/d/1ZPJscZF53IEL7iqj7EsFxKBRUKHGhtpk/view?usp=sharing

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Differential Equations With Boundary-Value Problems

Authors: Dennis G Zill, Ellen Monk, Warren S Wright

8th Edition

1285401298, 9781285401294

More Books

Students also viewed these Mathematics questions