Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

BIOST 2094 - Statistical Computing in R Spring 2011 Midterm Project Due March 1, 2011 The midterm project consist of writing an R function that

BIOST 2094 - Statistical Computing in R Spring 2011 Midterm Project Due March 1, 2011 The midterm project consist of writing an R function that performs the nonparametric Gehan test, applying the function to two datasets, and creating a useful graphical description of each dataset. The method for the Gehan test is given below along with an example. Followed by guidelines for writing the function and performing the analysis. Gehan Test The Gehan test is a nonparametric procedure for comparing the medians of two independent samples that may contain data that is left truncated at different values. We must assume that the truncation mechanism is the same for both populations. However we do not have to make any assumptions about the variance of the population distributions. This procedure is frequently used with environmental data to determine if levels of a chemical at a site are different then levels that naturally occur in nearby areas. For example, arsenic, a known carcinogen, naturally occurs in soil and is also a byproduct of mining activity. The Gehan test can be used to compare soil samples from a mining site to nearby areas that are not affected by mining. Furthermore, environmental data is often left truncated. Laboratory machines have a detection limit. If the concentration of a chemical is lower then this limit, the concentration cannot be detected. In which case the data is left truncated at a known detection limit. Procedure The following procedure is taken from the 2002 Naval Facilities Engineering Command Guidance for Environmental Background Analysis, Volume 1: Soil, available at the Argonne National Laboratory website, http://www.ead.anl.gov/. Suppose m background samples and n site samples are collected. Site refers to a potentially hazardous site that is being investigate and background refers to nearby areas that reflect naturally occurring levels. If an observation is a non-detect, that is laboratory machines are unable to detect the chemical concentration, then the detection limit of the machine is given along with a less-than sign to denote the truncated observation. The following procedure is used to test the hypothesis, H0 : Median of Site = Median of Background Ha : Median of Site Median of Background 1. List the combined m background and n site measurements, including non-detect values, from smallest to largest. The total number of combined samples is N = m + n. Use the given detection limit for non-detect data. 1 2. Determine the N ranks, R1 , R2 , . . . , RN , for the N ordered data values using the method described in the example below. 3. Compute the N scores, a(R1 ), a(R2 ), . . . , a(RN ), where a(Ri ) = 2Ri N 1, for i = 1, 2, . . . , N . 4. Compute the Gehan statistic, G, N X G= " hi a(Ri ) i=1 PN mn i=1 [a(Ri )]2 N (N 1) #1/2 where hi is an indicator, hi = 1 if the ith observation is from the site population and hi = 0 if the ith observation is from the background population. 5. Calculate the p-value. When m 10 and n 10 calculate the p-value using a largesample approximation. Otherwise for small m and n calculate the p-value using a permutation test. For large samples, the distribution of G is approximately standard normal. Therefore, reject the null hypothesis if G Z1 , where Z1 is the 100(1 )th percentile of the standard normal distribution. To perform a permutation test for small samples, (a) Take a random sample of size n from the pooled data without replacement. These n values represents site data and the other m observations are the background data. (b) Calculate G for this resample. (c) Repeat steps (a) and (b) several thousand times. (d) The distribution of the test statistics calculated in step (c) approximates the sampling distribution under the null hypothesis. The permutation p-value is the proportion of resamples that give a result at least as great as the observed G. 2 Example Below are 10 samples from site and background areas. The < denotes a nondetect observation, data that is left truncated at the detection limit. Background: Site: 1 <4 2 <4 5 8 7 <12 15 18 <21 <25 27 17 20 25 34 <35 40 43 the following steps are used to create this table which is then calculate g. data hi 1 0 i di 3 4 6 ei ri a(ri ) -13 -11 4.5 -12 -7 -5 9 -3 -9 10.5 11.5 e 10 11 12 13 14 12.5 13.5 15.5 16.5 17.5 9.5 -2 19 1. list combined m background and n site measurements in column of from smallest largest. use given detection limit for non-detect data. 2. place a or second table, , using rule: if ith measurement 3. third 1, 4. determine values these rules: first value detect, that is, set d1 =1 e1 =0. non-detect, 0, each successive row increase by when 2, . 20. 5. let t denote total number pooled datasets. dataset there non-detects. compute rank observation by, + (t 0. 6. scores, a(r1 ), a(r2 a(r20 where 2ri columns gehan statistic g since> 1.645 = Z1.05 , we reject the null hypothesis at the 0.05 level. 3 Project Guidelines 1. The goal of the project is to develop a function(s) that will be useful to other statisticians who are interested in using the Gehan test for large and small samples. You will need to decide on the structure and organization of your function(s). For example, you could write one function that performs the large-sample approximation and the permutation test. This function could include an argument that indicates which method to use with a default approach based on the sample size. Or you could write two separate functions one for each approach. You will also need to carefully consider the format and type of arguments that would be most appropriate for general use of the Gehan test. Do whatever you think is best, but keep in mind that you are writing a general program for others to use. Requirements Write an R function(s) that performs the Gehan test using the large-sample approximation and the permutation test. Include code that checks if the arguments are valid and returns an error message if there is an invalid argument. Return a warning message if the large-sample test is used when m < 10 and n < 10. Function(s) should return an object that contains the test statistic, the p-value, and the method. Create an S3 class for the object your function returns and write an S3 print method. The print method should output: the name of the test (Gehan), the method used, the test statistic, and the p-value. Comment your code. Including a description of the function(s), the type and format of the arguments, and a description of the values returned. Your comments should act like a manual for how to use the function. Also include comments in the body of the function that describe what is going on. 2. Apply your function(s) to the following two datasets. Use the large-sample approximation for the first dataset and the permutation test for the second dataset. Dataset 1 Background: 4 <18 13 Site: 49 10 <17 27 39 11 <23 <28 50 30 32 <6 <3 20 <26 9 29 <19 36 34 37 48 45 Dataset 2 Background: 18 <10 27 22 <3 Site: 30 44 23 <16 13 A < denotes non-detect data, in which case the detection-limit is given. 3. Create at least one graph for each dataset that will be useful for comparing site and background data. For non-detect data use the detection-limit. Please turn-in a hard copy of your R code along with a technical report of your results (do not submit raw output). Also submit via e-mail (njc23@pitt.edu) a copy of your R code. 4

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction to Probability

Authors: Mark Daniel Ward, Ellen Gundlach

1st edition

716771098, 978-1319060893, 1319060897, 978-0716771098

More Books

Students also viewed these Mathematics questions