Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Source of data: Golub et al . ( 1 9 9 9 ) . Molecular classification of cancer: class discovery and class prediction by gene

Source of data: Golub et al.(1999). Molecular classification of cancer: class discovery and class prediction by
gene expression monitoring, Science, Vol. 286:531-537.
The data set golub consists of the expression levels of 3051 genes for 38 tumor mRNA samples. Each tumor
mRNA sample comes from one patient (i.e.38 patients total), and 27 of these tumor samples correspond to
acute lymphoblastic leukemia (ALL) and the remaining 11 to acute myeloid leukemia (AML).
You will need to discover how many genes can be used to differentiate the tumor types (meaning that their
expression level differs between the two tumor types) using
the uncorrected p-values,
the Holm-Bonferroni correction, and (iii) the Benjamini-Hochberg correction?
Feel free to use libraries for multiple hypothesis testing in R or python.
If you are using Python, you can use the following code to load the data:
with zipfile.ZipFile("statsreview_release1.zip") as zip_file:
golub_data, golub_classnames =( np.genfromtxt(zip_file.open('data_and_materials/
golub_data/{}'.format(fname)), delimiter=',', names=True, converters={0: lambda s:
int(s.strip(b'"'))}) for fname in ['golub.csve, 'golub_cl.csv'])
Part (a)
0.02.0 points (graded)
Let xALL,i be the mean of the expression levels for gene i across the ALL mRNA samples. Similarly, let
xAML,i be the same but for the AML mRNA samples instead.
For each of these, NALL and NAML are the number of mRNA samples for the ALL tumors and AML tumors
respectively.
If sALL,i2 is the sample variance for gene i across the ALL mRNA observations, then the corresponding
variance for xALL,i is
sxAML,22=sAML,i2NAML
We can use xi=xALL,i-xAML,i as a metric for the difference in expression levels for gene i. The
variance of this metric is
sxi2=sxALL,i2+sxAML,i2
This allows us to use the following test statistic:
tWelch,i=xALL,i-xAML,isALL,i2NALL+sAML,i2NAML2
which you can recognize as similar to the t-test statistic, and is itself known as the Welch unequal variances t-
test.
The distribution for the Welch test statistic can be approximated by a t-distribution, but with a modified
number of degrees of freedom. The number of degrees of freedom is approximately
)i)ALL)AML
where )ALL and )AML.
Use the Welch t-test to find the number of significantly associated genes (0.05) using uncorrected p-
values.
How many genes are significant? (Please enter the value with a precision of at least two significant figures,
your answer will be graded with a 10% tolerance.)
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Accidental Data Scientist

Authors: Amy Affelt

1st Edition

1573877077, 9781573877077

More Books

Students also viewed these Databases questions

Question

How is the education level required for a position established?

Answered: 1 week ago

Question

Why is a job analysis important?

Answered: 1 week ago