Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

ST332 & ST409 Medical Statistics 2014-15: Exercises 2 1) Clinical Trial Design (based on an old exam question) A new drug against AIDS has been

ST332 & ST409 Medical Statistics 2014-15: Exercises 2 1) Clinical Trial Design (based on an old exam question) A new drug against AIDS has been developed that may be useful in conjunction with the standard treatment. It is proposed to mount a 'double blind placebo-controlled randomised phase III clinical trial' to test the new drug. Explain what is meant by the expression in quotes, and describe the most important types of bias that the investigators should be aware of in conducting and reporting the trial. A second group of investigators argues that it is unethical to leave a patient unsure of which treatment he or she is being given. They propose that the patients should first be split into two groups, A and B. Patients in group A will be offered a choice of the new combined treatment or the standard treatment alone. Patients in group B will not be informed about the trial, but their survival will be compared with that of goup A patients. Discuss the advantages and disadvantages of this alternative trial procedure. 2) Discussion of a Medical Article (based on an old assessment) Find the article: 'Cannabis use and mental health in young people: cohort study.' G. C. Patton et al. British Medical Journal (2002) 325: 1195-8 doi: 10.1136/bmj.325.7374.1195 [Note: the paper is now on the ST332 Resources webpage] (a) Define prevalence of a disease, case-control and cohort studies. (b) What aspects of the association between cannabis use, depression and anxiety (DA), and social factors can be investigated in this cohort study which would not be considered in cross-sectional studies. Explain your answer. (c) Give two advantages of self-administered questionnaires. (d) Are the results in Table 1 consistent with previous cross-section studies? Explain, quoting the reference numbers of the articles. (e) The last paragraph of 'Sample', page 1196, gives the response rates. Explain why it is important to try to have complete follow-up. Let D = 1 if a young adult has depression and anxiety, and D = 0 otherwise. Denote male and female by M and F respectively. Define a response variable R by R =1 if a young adult completed wave 7 (responded), and R = 0 otherwise. In order to assess what health care services are needed, an estimate of the overall number of young adults with DA is needed. Assume that the proportions of boys and girls at the start of this study accurately present the population proportions, and that the overall sample is representative. The following is a summary of the response rates by sex: Completed wave 7 Did not complete wave 7 Total Women 866 165 Men 735 266 Total 1601 431 1031 1001 2032 (f) The results (page 1196) state that 71 men and 188 women reported DA in wave 7. Using the above notation, state which probability, say pe is unbiasedly estimated by (71 + 188)/1601. Give a formula for the relationship between this probability and the prevalence. What numerical impact could the non-reponse have on the estimates of prevalence of depression and anxiety in young adults? What contraints on response patterns ensures that pe is an unbiased estimate of the prevalence? Give both formula and verbal interpretation. 1 3) Biases (based on an old exam question) Explain what is meant by 'Berkson's bias' and by 'Neyman bias'. Comment critically on the following summaries of medical investigations. Explain what biases each study may suffer from, what additional information you would need in each case to come to the stated conclusion, and suggest other ways to improve each investigation. (a) In a national study of stress-related disease, the rate of breast cancer for widows was found to be three times the rate of breast cancer for divorced women, four times that for married women, and five times the rate for single women. Hence 'stressful events are a partial cause of breast cancer, and the death of a spouse is more stressful than divorce, but even marriage is stressful'. (b) In a hospital study, a 2 2 table of men recovering from heart attacks (cases) and men in hospital for some other reason (controls), both groups classified into smokers and non-smokers, produced an odds ratio nearly 1, and a corresponding 'P > .5'. Hence 'there is no evidence that smokers are at greater risk of heart attacks than non-smokers'. (c) A large number of people with AIDS volunteered to try a new treatment in 1990, and the proportion surviving at least six months was significantly higher than for AIDS sufferers in 1989. Hence 'the new treatment has been shown to be effective'. 4) Determining Public Policy The following 2 2 table shows some data from a case-control study conducted in your county to determine risk factors for sudden infant death syndrome (SIDS) among children less than one year of age. The table counts the numbers of cases and controls according to whether or not the infants were prone sleepers (i.e. infants who slept on their stomachs). Cases Prone Not Prone 120 80 SIDS Controls 67 133 A public health professional in your county consults you regarding a study she recently completed that estimated the incidence of SIDS in the county to be 42 per 10,000 liveborn infants. The total number of liveborn infants per year in the county is 13,500. She would like to implement a SIDS prevention programme, and would like to know in advance how many actual cases of SIDS in your county could be eliminated each year if parents could be taught to avoid putting their infants to sleep in the prone position. Please provide her an estimate so that she can write a proposal to justify the expense of the education programme. 2 5) Interpreting Diagnostic Tests When a diagnostic test for some medical condition is given to a group of n individuals, they can each be classified into one of the cells of the following contingency table, where a + b + c + d = n: True status Positive Negative Test result Positive a b Negative c d For example, the proportion of individuals with the condition being tested for is (a + c)/(a + b + c + d), and the proportion of test results that were positive is (a + b)/(a + b + c + d). The 'true status' is typically determined by a 'gold standard' test that is regarded as definitive. (a) The following summary measures have all been suggested. TPR = a , a+c TNR = d , b+d PPV = a , a+b NPV = d , c+d FPR = b , b+d FNR = c , a+c FDR = b , a+b FOR = c , c+d ACC = a+d b+c a/(a + c) c/(a + c) , TMR = , PLR = , NLR = . a+b+c+d a+b+c+d b/(b + d) b/(b + d) For each of these twelve summaries, give a one-sentence interpretation. For example, TPR might be described by 'Out of those individuals with the condition, TPR is the proportion whose status is correctly given by the diagnostic test'. [Note: TPR stands for 'True Positive Rate' and is another name for 'sensitivity']. (b) Non-Invasive Prenatal Screening (NIPS) was introduced in 2013 for pregnant women in the USA; it involves testing a sample of the mother's blood to identify chromosomal abnormalities in the foetus. A particular test, costing about $1,000, claims to have 97.4% sensitivity and 99.6% specificity when used as a test for Edwards' syndrome. Edwards' syndrome is a very rare but severe condition. It occurs in perhaps one in 5,000 live births; fewer than 10% of such infants survive to their first birthday. The syndrome occurs in perhaps one in 2,000 conceptions, but in most cases the foetus does not survive until full term. The major risk factor is maternal age, but paternal age, genetic susceptibility, and environmental factors such as exposure to heavy metals are also relevant. The chance of conceiving a foetus with Edwards' syndrome has been reported to be fairly constant at about 0.03% in women aged below 30, but it then rises to an average of about 0.6% in women over 40. Given the above information: i. Estimate the counts a, b, c and d for a population of 2,000,000 newly pregnant women aged under 30 (this is roughly the annual number of live births to US mothers under 30). ii. Calculate the numerical values of each of the twelve summary measures from part (a), and say, with reasons, which of these measures (you may choose more than one) you think would be most informative to a 25-year-old pregnant woman in the US who is contemplating paying for a test for Edwards' syndrome. (c) Every year, over 100,000 women in the US aged 40 or over give birth. Write a paragraph summarising \u0001 the available data for example, based on your chosen measures from part (b) to provide statistical information to these women about the risk of Edwards' syndrome. 3 6) Variance of U-Statistics + Examples (ST409 ONLYbased on an old exam question) (a) Define what is meant by i. A parameter being estimable of degree r for the family of distributions F. ii. A symmetric kernel (x1 , . . . , xr ) of . (b) Let U (X1 . . . , Xr ) denote the U -statistic estimator of , corresponding to the symmetric kernel , from the sample (X1 . . . , Xr ). Let c denote Cov[(S), (S 0 )] where S and S 0 are subsamples of size r from (X1 . . . , Xr ), with c elements in common. Show that Var[U (X1 . . . , Xr )] = \u0013 r \u0012 \u0013\u0012 1 X r nr \u0001 c . n c rc r c=1 By demonstrating that \u0012 na b \u0013 = \u0001 nb 1 + o(1) b! as n , show that lim nVar[U (X1 . . . , Xr )] = r2 1 . n (c) Give examples of kernels for U -statistics that might be useful i. For summarising correlation between blood pressure and weight. ii. For comparing 2 treatments in a clinical trial using separate treatment and control groups. iii. For comparing 2 treatments in a clinical trial using matched controls. 7) Various U-Statistics (ST409 ONLYbased on an old exam question) (a) Given independant identically distributed random variables Z1 , Z2 , . . . , Zn , define the following terms: i. An estimable parameter of degree k, ii. A symmetric kernel (z1 , . . . , zk ) for , and iii. A one-sample U-statistic. (b) Show carefully that if Z = (X, Y ) is bivariate, then the covariance Cov(X, Y ) (assuming it exists) is estimable of degree 2, with symmetric kernel (z1 , z2 ) = 21 (x1 x2 )(y1 y2 ). (c) Interpret the parameters estimated by the following symmetric kernels: \u001a 1 if z > 0, i. I(z) = 0 otherwise, ii. I(z1 + z2 ) where I() is defined as above, iii. I(z1 + z2 > 2z3 ) \u0002 \u0003 iv. I (x1 x2 )(y1 y2 ) where zi = (xi , yi ). Obtain simple formulae for the corresponding U-statistics, and explain briefly how each U-statistic might be useful in Medical Statistics. 4 8) Two-Sample U-Statistics (ST409 ONLYbased on an old exam question) (a) Define a two-sample U -statistic. (b) Suppose X and Y are independent random variables, both with median 0. A plausible measure of the extent to which the distribution FY of Y is more \"spread out\" than the distribution FX of X is = Pr(Y < X < 0 or 0 < X < Y ). Show that Z 0 Z [FY (x) FX (x)]fX (x)dx + = [FX (x) FY (x)]fX (x)dx + 1/4, 0 and define a U -statistic that can be used to estimate . (c) A suggested alternative measure of inequality of scales for two distributions with the same median is given by the linear rank statistic N X S(Z) = ai Zi i=1 where Zi = 0 if the ith value in the ordered data set comes from distribution 1, Zi = 1 if the ith value in the ordered data set comes from distribution 2, and the weights ai are given by 2i for i even, 1 < i N/2, for i odd, i i N/2, 2i 1 if i = N/2, ai = N 2(N i) + 2 for (N i) even, N/2 < i N , 2(N i) + 1 for (N i) odd, N/2 < i < N . Explain why S measures inequality of spread. (Hint: you may find it helps to calculate the ai for some small N such as N = 9 or 10). (d) Show that {a1 , . . . , aN } is a permutation of the first N integers, and hence that the distribution of S(Z) under the null hypothesis of \"FX = FY \" is the same as the distribution of the two-sample Wilcoxon statistic W under the same null hypothesis. 5

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Homework Clubs Preparing For Algebra Math Help For Struggling Kids

Authors: Susan Everingham

1st Edition

1723708585, 978-1723708589

More Books

Students also viewed these Mathematics questions

Question

What is biochemistry?

Answered: 1 week ago