Question
sample of colon tissues were collected, 40 of which were tumor tissues and 22 non-tumor tissues. Tissues were analyzed using an Affymetrix oligonucleotide array and
sample of colon tissues were collected, 40 of which were tumor tissues and 22 non-tumor tissues. Tissues were analyzed using an Affymetrix oligonucleotide array and the expression of a particular gene was measured. A file with these data called "gene.txt" is posted on Canvas. To get the dataset into a data frame called gene in R's workspace, we ensure that the file "gene.txt" is in R's current working directory and then type gene=read.table("gene.txt") Researchers are interested in studying the association between this gene and tumor status. (a) (5 points) We wish to analyze these data with the two independent-samples model, where our goal is to make inference for whether or not the distribution of the expression of this gene is associated with tumor status, which has levels {tumor, healthy}. To do this, what must we assume about these measurements? If these assumptions are true, then what is the probability distribution of the random variable for which the first healthy tissue's gene expression measurement of 202.90000 is assumed to be a realization? Specify as much information about this distribution as possible. (b) (3 points) One of the assumptions that the two-independent samples t-test requires is that the distribution of the response has the same unknown standard deviation for both levels of the categorical explanatory variable. Is this a reasonable assumption for these data? In your response, please include either graphical or numerical support. (c) (4 points) Suppose that the assumptions stated in your solution to part 1a are true. Is there statistical evidence at the 1% significance level that distribution of the expression of this gene is associated with the tissue type? Perform a two-independent samples t-test. (d) (3 points) Repeat part 1c with the response changed to the natural logarithm of the expression of this gene. After this transformation, is it reasonable to assume that the distribution of the response has the same standard deviation for both levels of the cate- gorical explanatory variable? Explain
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started