Question
In this assignment, we consider again the data that was presented as part of the Assignment for Unit 2. In the current assignment, we apply
In this assignment, we consider again the data that was presented as part of the Assignment for Unit 2. In the current assignment, we apply some of the tools that were introduced in Units 5 and 6 in order to analyze the data. The data was collected from the donor database of the Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes its blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The current assignment involves data collected from a random sample of 748 donors. The data was obtained from the UCI Machine Learning Repository. This data was assembled by Prof. I-Cheng Yeh.
The file "transfusion.csv" contains the data. The file can be found here. The file contains 5 variables:
recency = The number of months since the last donation. (numeric)
frequency = The total number of donations. (numeric)
monetary = Total blood donated (in c.c.). (numeric)
time = The number of months since the first donation. (numeric)
March 2007 = An indicator. Indicates those that donated blood in March, 2007. (factor)
In the assignment, we consider the last four variables.
Comparing Two Samples
Consider "frequency" as a response and "march2007" as an explanatory variable. Plot the relation between the two variables, test the equality of the expectation in the two sub-samples and the equality of the variance. Repeat the same analysis for the case where the response "frequency" is replaced by the log-transformed response: "log(frequency)". In Tasks 1-3 you are asked to describe the results of the analysis.
Linear Regression
In Tasks 4-7 you are asked to conduct an analysis similar to the analysis of Tasks 1-3. The difference is that the numerical variable "time" is used as the explanatory variable. The model of linear regression assumes that the expectation of the response is a linear function of the explanatory variable. Another assumption of the model is that the variance of the response is constant for each value of the explanatory variable. Frequently, however, one may observe an increase in the variance for larger values of the explanatory variable. Replacing the response by the log-transformed response is a commonly used method to overcome this difficulty. The analysis that involves the log of the response can be carried out via the replacement of the response "frequency" in the formula by the transformed response "log(frequency)".
The Relation Between Two Variables
The final Task 8 involves the investigation of the relation between the response "frequency" and the variable "monetary".
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started