Question: PHEB 609: Categorical Data Analysis Homework 1 due 09/06/2017 by end of the day 1. In the following examples, identify the response variable and the

PHEB 609: Categorical Data Analysis Homework 1 due 09/06/2017 by end of the day 1. In the following examples, identify the response variable and the explanatory variables. a. Attitude toward gun control (favor, oppose), Gender (female, male), Mother's education (high school, college). b. Heart disease (yes, no), Blood pressure, Cholesterol level. c. Race (white, nonwhite), Religion (Catholic, Jewish, Protestant), Vote for president (Democrat, Republican, Other), Annual income. d. Marital status (married, single, divorced, widowed), Quality of life (excellent, good, fair, poor). 2. Which scale of measurement is most appropriate for the following variables - nominal, or ordinal? a. Political party affiliation (Democrat, Republican, unaffiliated). b. Highest degree obtained (none, high school, bachelor's, master's, doctorate). c. Patient condition (good, fair, serious, critical). d. Hospital location (London, Boston, Madison, Rochester, Toronto). e. Favorite beverage (beer, juice, milk, soft drink, wine, other). f. How often feel depressed (never, occasionally, often, always). 3. Each of 100 multiple-choice questions on an exam has four possible answers but one correct response. For each question, a student randomly selects one response as the answer. a. Specify the distribution of the student's number of correct answers on the exam. b. Based on the mean and standard deviation of that distribution, would it be surprising if the student made at least 50 correct responses? Explain your reasoning. 4. Genotypes AA, Aa, and aa occur with probabilities (1, 2, 3). For n=3 independent observations, the observed frequencies are (n1, n2, n3). a. Explain how you can determine n3 from knowing n1 and n2. Thus, the multinomial distribution of (n1, n2, n3) is actually two-dimensional. b. Show the set of all possible observations, (n1, n2, n3) with n=3. c. Suppose (1, 2, 3) = (0.25,0.50,0.25). Find the multinomial probability that (n1, n2, n3) = (1,2,0). d. Refer to (c). What probability distribution does n1 alone have? Specify the value of the sample size index and parameter for that distribution. 5. A coin is flipped three times. Let Y = number of heads obtained, when the probability of a head for a flip equals . a. Assuming = 0.50, specify the probabilities for the possible values for Y , and find the distribution's mean and standard deviation. b. Find the binomial probabilities for Y when equals (i) 0.60, (ii) 0.40. c. Suppose you observe y = 1 and do not know . Calculate and sketch the likelihood function. d. Using the plotted likelihood function from (c), find the ML estimate of . CATEGORICAL DATA ANALYSIS Lecture 1 Introduction to Categorical Data Analysis 1 THIS WEEK'S ASSIGNMENT READING: AGRESTI, CHAPTER 1.1 - 1.3, 1.4.1 HOMEWORK #1. PLEASE UPLOAD IT TO ECAMPUS BY 11:59PM ON WEDNESDAY 09/06/17 2 TODAY'S OUTLINE OF TOPICS SYLLABUS TYPES OF DATA OUTCOME AND PREDICTOR VARIABLES SAMPLE VS. POPULATION SAMPLE CHARACTERISTICS PROBABILITY MODELS FOR DISCRETE DATA STATISTICAL INFERENCE FOR CATEGORICAL DATA 3 TYPES OF DATA CONTINUOUS (VERY FAMILIAR) DISCRETE CATEGORICAL NOMINAL DATA (BINARY, MULTINOMIAL) ORDINAL DATA (BINARY, MULTINOMIAL) COUNT 4 OUTCOME AND PREDICTOR VARIABLES OUTCOME VARIABLES: (A.K.A. DEPENDENT OR RESPONSE VARIABLES) THEY APPEAR ON THE .... SIDE OF THE REGRESSION EQUATION PREDICTOR VARIABLES (A.K.A. INDEPENDENT OR EXPLANATORY VARIABLES) THEY APPEAR ON THE .... SIDE OF THE REGRESSION EQUATION WE ARE INTERESTED IN WHAT THE VALUES OF THE PREDICTOR VARIABLE(S) CAN TELL US ABOUT THE VALUES OF THE OUTCOME VARIABLE. 5 CATEGORICAL DATA ANALYSIS Example: Categorical data or not? CATEGORICAL DATA ANALYSIS IS analysis THE ANALYSIS OF 1. We are interested in knowing how(WE does drug CATEGORICAL OUTCOME VARIABLES WILL ALSO use affect the infection rate of some disease in SPEND TWO LECTURES ON THE COUNT OUTCOME a specific population group in south Texas VARIABLES) 2. Is there evidence that kid who consume milk early tend to be taller than those who don't? THE PREDICTOR VARIABLES CAN BE DISCRETE OR CONTINUOUS. 6 CORNERSTONE OF INFERENCE WE OFTEN OBSERVE CHARACTERISTICS OF A SAMPLE, SUCH AS PROPORTIONS FOR BINARY DATA, MEANS, STANDARD DEVIATIONS, BUT WE WOULD LIKE TO MAKE INFERENCE ABOUT THE POPULATION, SINCE THE SAMPLE CHARACTERISTICS CAN VARY FOR DIFFERENT SAMPLES. p 7 SAMPLE CHARACTERISTICS FOR BINARY DATA: PROPORTIONS EXAMPLE: NUMBER OF CANCER DEATHS IN A CLOSED COHORT OF 100 CANCER PATIENTS Give me some examples of a population and a sample? NUMBER OF BIRTH DEFECTS OUT OF THE TOTAL NUMBER OF BIRTHS IN TEXAS IN 2006 8 PARAMETERS AND STATISTICAL MODELING STATISTICAL MODELS (WITH PARAMETERS) ARE USED TO DESCRIBE POPULATIONS \"ESSENTIALLY, ALL MODELS ARE WRONG, BUT SOME ARE USEFUL\"---GEORGE E. P. BOX OUR GOALS ARE TO ASSUME A MODEL FOR THE DATA AND THEN MAKE INFERENCE ABOUT THE PARAMETERS IN THE MODEL, SUCH AS POINT ESTIMATION, INTERVAL ESTIMATION AND HYPOTHESIS TESTING. 9 COMMON DISCRETE DISTRIBUTIONS BERNOULLI BINOMIAL MULTINOMIAL POISSON NEGATIVE BINOMIAL 10 BERNOULLI THE BERNOULLI DISTRIBUTION (SOMETIMES CALLED BERNOULLI TRIAL) DESCRIBES THE OUTCOME OF A SINGLE BINARY OBSERVATION WITH SOME PROBABILITY OF \"SUCCESS\". A SINGLE COIN TOSS IS A GOOD EXAMPLE. LET Y DENOTE THE OUTCOME (1 OR 0), AND THE PROBABILITY OF SUCCESS, WE CAN OBTAIN P(Y y) y (1 )(1 y ) = 0,1 0<<1 P(Y 1) 1 (1 )(11) 1 (1 )0 P(Y 0) 0 (1 )(10) 0 (1 )1 1 11 BERNOULLI (1) THE RANDOM VARIABLE IS Y THE PARAMETER IS P(Y 1) 1 (1 )(11) 1 (1 )0 P(Y 0) 0 (1 )(10) 0 (1 )1 1 IF WE FLIP A SINGLE FAIR COIN WITH = 0.45 THEN: P(HEADS) = = 0.45 AND P(TAILS) = (1-) = (1 - 0.45) = 0.55 12 BINOMIAL DISTRIBUTION THE BINOMIAL DISTRIBUTION DESCRIBES THE PROBABILITY OF HAVING A TOTAL OF Y SUCCESSES FROM N INDEPENDENT AND IDENTICAL BERNOULLI, WHERE THE PROBABILITY OF \"SUCCESS\" IS (FIXED). USE FOR BINARY CATEGORICAL DATA TYPE 13 BINOMIAL DISTRIBUTION (1) n y P(Y y ) (1 )n y y y 0,1, 2,..., n 0< 1 14 BINOMIAL DISTRIBUTION (2) N N! y y! ( N y )! 5 5! 5! 2 2! (5 2)! 2!3! (5 4 3 2 1) (5 4) 20 10 (2 1)(3 2 1) (2 1) 2 15 BINOMIAL DISTRIBUTION (3) n y n y P( y ) (1 ) y The mean, or expectation of Y: E (Y ) n The variance of Y: Var (Y ) n (1 ) 2 The standard deviation of Y: n (1 ) 16 EXAMPLE IF WE TOSS A UNBIASED (FAIR) COIN, WHERE THE PROBABILITY OF GETTING A HEAD IS =0.5, AND WE TOSS IT 10 TIMES. THE PROBABILITY OF OBSERVING 6 HEADS IS n y 10 6 n y P( y 6) (1 ) 0.5 (1 0.5)106 0.205 y 6 THE PROBABILITY OF OBSERVING AT LEAST 6 HEADS y 10 n y IS P(Y 6) (1 ) n y 0.377 y 6 y 17 STATA CODE STATA PROGRAM TO CALCULATE THE BINOMIAL PROBABILITIES . DI COMB(10,6)*(0.5^6)*((1-0.5)^(10-6)) (DI BINOMIALP(10,6,.5)) .20507813 . DI 1-BINOMIAL(10,5,0.5) (DI BINOMIALTAIL(10,6,.5)) .37695313 [BINOMIAL(N,K,P) RETURNS THE PROBABILITY OF OBSERVING K OR FEWER SUCCESSES IN N TRIALS WHEN THE PROBABILITY OF A SUCCESS ON ONE TRIAL IS P.] 18 MEAN AND SD (EXAMPLE) THE EXPECTED (MEAN) NUMBER OF HEADS WE WILL OBTAIN FROM THESE TOSSES IS E(Y)=... THE STANDARD DEVIATION OF THE TOTAL NUMBER OF HEADS? ... 19 MULTINOMIAL DISTRIBUTION THE MULTINOMIAL DISTRIBUTION IS A SIMPLE EXTENSION OF THE BINOMIAL DISTRIBUTION WHERE THERE ARE MORE THAN TWO POSSIBLE OUTCOMES. GIVEN THERE ARE C CATEGORIES, AND A TOTAL OF N TRIALS, THE PROBABILITY OF GETTING Y1,..YC RESPONSES IN EACH CATEGORY IS P( y1 , y2 , n y1 y2 , yc ) 1 2 y1 y2 yc yc c 20 MULTINOMIAL DISTRIBUTION (1) EXAMPLE: 20 PATIENTS HAVE ONE OF 3 POSSIBLE OUTCOMES: Y1 = NUMBER WHO IMPROVE Y2 = NUMBER WHO STAY THE SAME Y3 = NUMBER WHO WORSEN 1 = P(IMPROVE) 2 = P(STAY THE SAME) 3 = P(WORSEN) 21 MULTINOMIAL DISTRIBUTION (2) P( y1 , y2 , E (Y j ) n j n y1 y2 , yc ) 1 2 y1 y2 yc yc c j 2 Var (Y j ) n j (1 j ) Cov(Y j , Yk ) n j k 22 POISSON DISTRIBUTION THE POISSON DISTRIBUTION IS USED TO DESCRIBE THE PROBABILITY OF RANDOM EVENTS IN SOME INTERVAL OF SPACE OR TIME. VERY USEFUL WHEN MODELLING OUTCOME VARIABLES RELATED TO SPREAD OF DISEASE, NUMBER OF PEOPLE INFECTED IN A GIVEN SPACE, LOCATION OVER A GIVEN PERIOD, ETC ... 23 POISSON DISTRIBUTION A discrete random variable Y is said to have a Poisson distribution with parameter >0, i.e. Poisson(), if the probability mass function of Y is given by: y P( y ) e y! y 0,1, 2... 0 E (Y ) Var (Y ) 24 OVERDISPERSION OVERDISPERSION OCCURS WHEN THE OBSERVED VARIABILITY OF THE DISTRIBUTION EXCEEDS THE PREDICTED VARIABILITY. THIS IS USUALLY DUE TO A VIOLATION OF THE ASSUMPTION THAT EACH TRIAL IS IDENTICAL OR THAT EACH TRIAL IS INDEPENDENT OF THE OTHERS. AN ALTERNATIVE TO THE POISSON DISTRIBUTION IS THE NEGATIVE BINOMIAL DISTRIBUTION. 25 NEGATIVE BINOMIAL DISTRIBUTION Suppose there is a sequence of independent Bernoulli trials, where each trial has the probability of success of . We observe this sequence until a number of r failures has occurred. Then the random number of successes we have seen, Y, follows a NB distribution: y r 1 y (1 ) r P( y) y r E (Y ) 1 y 0,1, 2... 0< <1 r var (y ) (1 2 26 summary of discrete distributions questions bernoulli trial one a with outcomes 1. i have penny, two nickel,possible dime and quarter. the chance distribution face is different for each coin. toss binomial outcome ofcounts n experiments coin once number having faces. possible events 2.multinomial we are interested in emergency visits at many given hospital between 9pm midnight poisson 3. from 2, data months this count thecollect over timestwo some eventfrom occurs hospital. computedistribution sample mean (9) negative counts. allows overdispersion standard deviation (5). what model would you use that data? 27 statistical inference parameter point estimation interval asymptotic (wald) method exact hypothesis test large methods (asymptotic) wald tests score likelihood ratio 28 if collect which distributions, how do estimate parameters most commonly used called maximum 29 mle \"the defined to be value probability observed takes its greatest value. function maximum.\" 30 (1) looks like density but it now particular data. y p( | n, l( )n 31 suppose total 3 successes 10 independent trials success trial, ? can calculated values : 10, 3) )103 l 0.0 0.000 0.6 0.042 0.1 0.057 0.7 0.009 0.2 0.201 0.8 0.001 0.3 0.267 0.9 0.4 0.215 1.0 0.5 0.117 32 plot 0 .1 .2 .3 & .4 .5 pi .6 .7 .8 .9 133 stata code file plotting clear set obs 101 gen [_n-1] 100 comb(10,3)*(pi^3)*((1-pi)^(10-3)) twoway (line pi), ytitle(likelihood( )) xtitle(pi) xlabel(0(.1)1) title(likelihood scheme(s1mono) 34 finding so example happens equal or proportion successes. rather than graph could an old tool calculus derive estimator . 35 (2) peak tangent line equals zero. calculus, first derivative function. take find such zero, found maximum. 36 (3) turns out much easier compute natural logarithm itself. fortunately, maxima both functions same place will 37 (4) start function: ln * ln( (n ln(1 38 (5) log-likelihood respect also u. u ( 1 rearranging terms:> 0.36 FOR A CHISQUARED DISTRIBUTION WITH ONE DEGREE OF FREEDOM. 77 LRT USING STATA * ASSIGN VALUES TO N, Y, AND PI UNDER THE NULL AND ALTERNATIVE HYPOTHESES SCALAR N = 100 SCALAR Y = 53 SCALAR PI_H0 = 0.5 SCALAR PI_HA = 0.53 * CALCULATE THE LIKELIHOODS, THE LIKELIHOOD RATIO TEST AND THE P-VALUE LOCAL L0 = COMB(N,Y)*(PI_H0^Y)*((1-PI_H0)^(N-Y)) LOCAL L1 = COMB(N,Y)*(PI_HA^Y)*((1-PI_HA)^(N-Y)) LOCAL LR_TEST = -2*LN(`L0'/`L1') LOCAL LR_PVALUE = 1-CHI2(1,`LR_TEST') * DISPLAY THE RESULTS DISP "L0 = `L0'" DISP "L1 = `L1'" DISP "-2LN(L0/L1) = `LR_TEST'" DISP "P-VALUE = `LR_PVALUE'" 78 LRT USING STATA (1) THE RESULTING P-VALUE OF 0.5484 INDICATES THAT THE NULL HYPOTHESIS SHOULD NOT BE REJECTED (AT THE 0.05 LEVEL) 79 LRT USING STATA (2) HYPOTHESIS TESTS BASED ON WALD, SCORE, OR LIKELIHOOD RATIO TESTS ARE ALSO CALLED \"LARGE SAMPLE\" OR \"ASYMPTOTIC\" TESTS. THEY ARE EQUIVALENT WHEN THE SAMPLE SIZE IS LARGE. WHEN N5 AND N(1-)5, THE LARGE-SAMPLE TWOSIDED SCORE TESTS PERFORM REASONABLY WELL. WHEN THE SAMPLE SIZE IS SMALL TO MODERATE, THE WALD TEST IS THE LEAST RELIABLE OF THE THREE TESTS. FOR SMALL SAMPLE SIZES, WE CAN PERFORM \"EXACT\" TESTS. 80 LRT USING STATA WE COULD ALSO COMPUTE AN EXACT TEST BASED ON THE BINOMIAL DISTRIBUTION USING STATA WITH THE FOLLOWING SYNTAX: BITESTI N P_OBSERVED P_NULL 81 BRIEF SUMMARY WE CAN DESCRIBE OBSERVATIONS USING PROBABILITY DISTRIBUTIONS. WE CAN ESTIMATE THE PARAMETERS OF A PROBABILITY DISTRIBUTION USING MAXIMUM LIKELIHOOD ESTIMATION WE CAN CONSTRUCT CONFIDENCE INTERVALS FOR THOSE PARAMETER ESTIMATES USING ASYMPTOTIC OR EXACT METHODS WE CAN TEST HYPOTHESES ABOUT PARAMETERS USING ASYMPTOTIC (WALD, SCORE, OR LIKELIHOOD RATIO) OR EXACT METHODS 82

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!