Questions and Answers of Linear State Space Systems

Other link functions: Other link functions for binary data include the inverse cdf of a t distribution (the probit being the limit as df → ∞); a log-gamma link (Genter and Farewell 1985), for
Conditional logistic: For more details about case-control studies and conditional logistic regression, see Breslow and Day (1980, Chapter 7). For more on “exact” inference using conditional
Propensity scores: Rosenbaum and Rubin (1983) proposed methods of comparing E(y)for two groups in observational studies while adjusting for possibly confounding variables x. They defined the
Binary GLM history: The probit model was presented by Bliss (1935) and popularized in three editions of Finney (1971). Logistic regression was proposed by Berkson (1944)as a model that has similar
For the population having value y on a binary response, suppose x has an N(????y, ????2) distribution, y = 0, 1.a. Using Bayes’ theorem, show that P(y = 1 ∣ x) satisfies the logistic regression
Refer to Note 1.5. For a logistic model, show that the average estimated rate of change in the response probability as a function of explanatory variable j, adjusting for the others, satisfies 1
Construct the ROC curves for (a) the toy example in Section 5.4.2 with complete separation and (b) the dataset (n = 8) that adds two observations at x = 3.5, one with y = 1 and one with y = 0. In
From the likelihood equation (5.5) for a logistic regression intercept parameter, show that the overall sample proportion of successes equals the sample mean of the fitted success probabilities. Is
Suppose that niyi has a bin(ni, ????i) distribution. Consider a binary GLM ????i =F(∑j ????j xij) with F the standard cdf of some family of continuous distributions.Find wi in wi =
Explain how expression (5.6) for var( ̂ ????̂) in logistic regression suggests that the standard errors of {????̂j} tend to be smaller as you obtain more data. Answer this for (a) grouped data
Assuming the model logit[P(yi = 1)] = ????xi, you take all n observations at x0.Find ????̂ and the large-sample var(????̂). For the Wald test, explain why the chisquared noncentrality is
For a 2 × 2 × ???? contingency table that cross classifies y with a binary treatment variable x and an adjustment factor z, specify a logistic model with a lack of interaction between x and z.
To use conditional logistic regression to test H0: ????1 = 0 against H1: ????1 < 0 for the toy example in Section 5.4.2, find the conditional distribution of ∑i xiyi, given ∑i yi. Find the exact
The calibration problem is that of estimating x0 at which P(y = 1) = ????0 for some fixed ????0 such as 0.50. For the logistic model with a single explanatory variable, explain why a confidence
Construct the log-likelihood function for the model logit(????i) = ????0 + ????1xi with independent binomial proportions of y1 successes in n1 trials at x1 = 0 and y2 successes in n2 trials at x2 =
Refer to the previous exercise. Denote the cell counts in the 2 × 2 table by{nij}. For the case ????1 = 0 (the independence model), the fitted values in the cells of that table are { ̂????ij =
Suppose the logistic model holds in which x is uniformly distributed between 0 and 100, and logit(????i) = −2.0 + 0.04xi. Randomly generate 100 independent observations from this model. Plot the
Let niyi be a bin(ni, ????i) variate for group i, i = 1,…,N, with {yi} independent. Consider the null model, for which ????1 = ⋯ = ????N. Show that̂???? = (∑i ni yi)∕(∑i ni). When all ni =
Let yi be a bin(1, ????i) variate, i = 1,…,N. For the model logit(????i) = ????0 + ????1xi, show that the deviance depends on ̂????i but not yi. Hence, it is not useful for checking model fit.
A study has ni independent binary observations {yi1,…, yini} at xi, i =1,…,N, with n = ∑i ni. Consider the model logit(????i) = ????0 + ????1xi, where????i = P(yij = 1).a. Show that the kernel
Use the following toy data to illustrate comments in Section 5.5 about grouped versus ungrouped binary data in the effect on the deviance:--------------------------------------------------------x
Refer to the deviance comparison statistic G2(M0 ∣ M1) introduced in Section 4.4.3. For a sequence of s nested binary response models M1,…, Ms, model Ms is the most complex. Let v denote the
In a football league, for matches involving teams a andb, let ????ab be the probability that a defeatsb. Suppose ????ab + ????ba = 1 (i.e., ties cannot occur).Bradley and Terry (1952) proposed the
Let yi, i = 1,…,N, denote N independent binary random variables.a. Derive the log-likelihood for the probit model Φ−1[????(xi)] = ∑j ????jxij.b. Show that the likelihood equations for the
An alternative latent variable model results from early applications of binary response models to toxicology studies (such as Table 5.4) of the effect of dosage of a toxin on whether a subject dies,
Consider the choice between two options, such as two product brands. Let Uy denote the utility of outcome y, for y = 0 and y = 1. Suppose Uy = ????y0 +????y1x + ????y, using a scale such that ????y
When Φ−1(????i) = ????0 + ????1xi, explain why the response curve for ????i [or for 1 − ????i, when ????1 < 0] has the appearance of a normal cdf with mean ???? =−????0∕????1 and standard
Consider binary GLM F−1(????i) = ????0 + ????1xi, where F is a cdf corresponding to a pdf f that is symmetric around 0. Show that xi at which ????i = 0.50 is xi = −????0∕????1. Show that the
For the model log[− log(1 − ????i)] = ????0 + ????1xi, find xi at which ????i = 1 2 . Show that the greatest rate of change of ???? occurs at x = −????0∕????1, and find ???? at that point.
In a study of the presence of tumors in animals, suppose {yi} are independent counts that satisfy a Poisson loglinear model, log(????i) = ∑j ????jxij. However, the observed response merely
Suppose y = 0 at x = 10, 20, 30, 40 and y = 1 at x = 60, 70, 80, 90. Using software, what do you get for estimates and standard errors when you fit the logistic regression model (a) to these data?
For the logistic model (5.7) for a 2 × 2 table, give an example of cell counts corresponding to (a) complete separation and ????̂1 = ∞, (b) quasi-complete separation and ????̂1 = ∞, (c)
You plan to study the relation between x = age and y= whether belong to a social network such as Facebook (1 = yes). A priori, you predict that P(y = 1)is currently between about 0.80 and 0.90 at x =
In one of the first studies of the link between lung cancer and smoking7, Richard Doll and Austin Bradford Hill collected data from 20 hospitals in London, England. Each patient admitted with lung
To illustrate Fisher’s exact test, Fisher (1935) described the following experiment: a colleague of his claimed that, when drinking tea, she could distinguish whether milk or tea was added to the
For the horseshoe crab dataset (Crabs.dat at the text website) introduced in Section 4.4.3, let y = 1 if a female crab has at least one satellite, and let y = 0 if a female crab does not have any
The dataset Crabs2.dat at the text website collects several variables that may be associated with y = whether a female horseshoe crab is monandrous(eggs fertilized by a single male crab) or
Refer to the previous exercise. Download the file from the text website.Using year of observation, Fcolor, Fsurf, FCW = female’s carapace width, AMCW = attached male’s carapace width, AMcolor =
The New York Times reported results of a study on the effects of AZT in slowing the development of AIDS symptoms (February 15, 1991). Veterans whose immune symptoms were beginning to falter after
Download the data for the example in Section 5.7.1. Fit the main effects model.What does your software report for ????̂1 and its SE? How could you surmise from the output that actually ????̂1 = ∞?
Refer to the previous exercise. For these data, what, if anything, can you learn about potential interactions for pairs of the explanatory variables? Conduct the likelihood-ratio test of the
Table 5.5 shows data, the file SoreThroat.dat at the text website, from a study about y = whether a patient having surgery experienced a sore throat on waking (1 = yes, 0 = no) as a function of d =
Show that the multinomial variate y = (y1,…, yc−1)T (with yj = 1 if outcome j occurred and 0 otherwise) for a single trial with parameters (????1,…, ????c−1)has distribution in the (c −
For the baseline-category logit model without constraints on parameters,????ij = exp(xi????j)∑c h=1 exp(xi????h), show that dividing numerator and denominator by exp(xi????c) yields new parameters
Derive Equation (6.3) for the rate of change. Show how the equation for binary models is a special case.
With three outcome categories and a single explanatory variable, suppose????ij = exp(????j0 + ????jxi)∕[1 + exp(????10 + ????1xi) + exp(????20 + ????2xi)], j = 1, 2. Show that ????i3 is (a)
Derive the deviance expression in Equation (6.5) by deriving the corresponding likelihood-ratio test.
For a multinomial response, let uij denote the utility of response outcome j for subject i. Suppose that uij = xi????j + ????ij, and the response outcome for subject i is the value of j having
Derive the likelihood equations and the information matrix for the discretechoice model (6.6).
Consider the baseline-category logit model (6.1).a. Suppose we impose the structure ????j = j????, for j = 1,…, c − 1. Does this model treat the response as ordinal or nominal? Explain.b. Show
Section 5.3.4 introduced Fisher’s exact test for 2 × 2 contingency tables. For testing independence in a r × c table in which the data are c independent multinomials, derive a conditional
Does it make sense to use the cumulative logit model of proportional odds form with a nominal-scale response variable? Why or why not? Is the model a special case of a baseline-category logit model?
Show how to express the cumulative logit model of proportional odds form as a multivariate GLM (6.4).
For a binary explanatory variable, explain why the cumulative logit model with proportional odds structure is unlikely to fit well if, for an underlying latent response, the two groups have similar
Consider the cumulative logit model, logit[P(yi ≤ j)] = ????j + ????jxi.a. With continuous xi taking values over the real line, show that the model is improper, in that cumulative probabilities are
For the cumulative link model, G−1[P(yi ≤ j)] = ????j + xi????, show that for 1 ≤ j < k ≤ c − 1, P(yi ≤ k) equals P(yi ≤ j) at x∗, where x∗ is obtained by increasing component h of
For an ordinal multinomial response with c categories, let????ij = P(yi = j ∣ yi ≥ j) = ????ij????ij + ⋯ + ????ic, j = 1,…, c − 1.The continuation-ratio logit model is logit(????ij) = ????j
Consider the null multinomial model, having the same probabilities {????j}for every observation. Let ???? = ∑j bj????j, and suppose that ????j = fj(????) > 0, j =1,…,c. For sample proportions {pj
A response scale has the categories (strongly agree, mildly agree, mildly disagree, strongly disagree, do not know). A two-part model uses a logistic regression model for the probability of a don’t
The file Alligators2.dat at the text website is an expanded version of Table 6.1 that also includes the alligator’s gender. Using all the explanatory variables, use model-building methods to select
For 63 alligators caught in Lake George, Florida, the file Alligators3.dat at the text website classifies primary food choice as (fish, invertebrate, other)and shows alligator length in meters.
The following R output shows output from fitting a cumulative logit model to data from the US 2008 General Social Survey. For subject i let yi = belief in existence of heaven (1 = yes, 2 = unsure, 3
Refer to the previous exercise. Consider the model log(????ij∕????i3) = ????j + ????G j xi1 + ????R j xi2, j = 1, 2.a. Fit the model and report prediction equations for log(????i1∕????i3),
Refer to Exercise 5.33. The color of the female crab is a surrogate for age, with older crabs being darker. Analyze whether any characteristics or combinations of characteristics of the attached male
A 1976 article by M. Madsen (Scand. J. Stat. 3: 97–106) showed a 4 × 2 × 3 × 3 contingency table (the file Satisfaction.dat at the text website) that cross classifies a sample of residents of
At the website sda.berkeley.edu/GSS for the General Social Survey, download a contingency table relating the variable GRNTAXES (about paying higher taxes to help the environment) to two other
Suppose {yi} are independent Poisson observations from a single group. Find the likelihood equation for estimating ???? = E(yi). Show that ̂???? = ȳ regardless of the link function.
Suppose {yi} are independent Poisson variates, with ???? = E(yi), i = 1,…, n.For testing H0: ???? = ????0, show that the likelihood-ratio statistic simplifies to−2(L0 − L1) = 2[n(????0 − ȳ)
Refer to the previous exercise. Explain why, alternatively, for large samples you can test H0 using the standard normal test statistic z = √n(ȳ − ????0)∕√????0.Explain how to invert this
When y1 and y2 are independent Poisson with means ????1 and ????2, find the likelihood-ratio statistic for testing H0: ????1 = ????2. Specify its asymptotic null distribution, and describe the
For the one-way layout for Poisson counts (Section 7.1.5), using the identity link function, show how to obtain a large-samples confidence interval for????h − ????i. If there is overdispersion,
For the one-way layout for Poisson counts, derive the likelihood-ratio statistic for testing H0: ????1 = ⋯ = ????c.
For the one-way layout for Poisson counts, derive a test of H0: ????1 = ⋯ = ????c by applying a Pearson chi-squared goodness-of-fit test (with df = c − 1) for a multinomial distribution that
In a balanced two-way layout for a count response, let yijk be observation k at level i of factor A and level j of factor B, k = 1,…, n. Formulate a Poisson loglinear main-effects model for
Refer to Note 1.5. For a Poisson loglinear model containing an intercept, show that the average estimated rate of change in the mean as a function of explanatory variable j satisfies 1 n∑i(????
A method for negative exponential modeling of survival times relates to the Poisson loglinear model for rates (Aitkin and Clayton 1980). Let T denote the time to some event, with pdf f and cdf F. For
Consider the loglinear model of conditional independence between A and B, given C, in a r × c × ???? contingency table. Derive the likelihood equations, and interpret. Give the solution of fitted
Two balanced coins are flipped, independently. Let A = whether the first flip resulted in a head (yes, no), B = whether the second flip resulted in a head, and C = whether both flips had the same
For three categorical variables A, B, and C:a. When C is jointly independent of A and B, show that A and C are conditionally independent, given B.b. Prove that mutual independence of A, B, and C
Express the loglinear model of mutual independence for a 2 × 2 × 2 table in the formlog ???? = X????. Show that the likelihood equations equate {yijk} and { ̂????ijk} in the one-dimensional
For a 2 × c × ???? table, consider the loglinear model by which A is jointly independent of B and C. Treat A as a response variable and B and C as explanatory, conditioning on {n+jk}. Construct the
For the homogeneous association loglinear model (7.7) for a r × c × ???? contingency table, treating A as a response variable, find the equivalent baselinecategory logit model.
For a four-way contingency table, consider the loglinear model having AB, BC, and CD two-factor terms and no three-factor interaction terms. Explain why A and D are independent given B alone or given
Suppose the loglinear model (7.7) of homogeneous association holds for a three-way contingency table. Find log ????ij+ and explain why marginal associations need not equal conditional associations
Consider the loglinear model for a four-way table having AB, AC, and AD two-factor terms and no three-factor interaction term. What is the impact of collapsing over B on the other associations?
A county’s highway department keeps records of the number of automobile accidents reported each working day on a superhighway that runs through the county. Describe factors that are likely to cause
Show that a gamma mixture of Poisson distributions yields the negative binomial distribution.
Given u, y is Poisson with E(y ∣ u) = u????, where u is a positive random variable with E(u) = 1 and var(u) = ????. Show that E(y) = ???? and var(y) = ???? + ????????2.Explain how you can formulate
For discrete distributions, Jørgensen (1987) showed that it is natural to define the exponential dispersion family as f(yi; ????i, ????) = exp[yi????i − b(????i)∕a(????) + c(yi, ????)].a. For
For a sequence of independent Bernoulli trials, let y = the number of successes before the kth failure. Show that y has the negative binomial distribution, f(y; ????, k) = Γ(y + k)Γ(k)Γ(y +
With independent negative binomial observations from a single group, find the likelihood equation and show that ̂???? = ȳ. (ML estimation for ???? requires iterative methods, as R. A. Fisher
For the ZIP null model (i.e., without explanatory variables), show from the likelihood equations that the ML-fitted 0 count equals the observed 0 count.
The text website contains an expanded version (file Drugs3.dat) of the student substance use data of Table 7.3 that also has each subject’s G = gender(1 = female, 2 = male) and R = race (1 = white,
Other than a formal goodness-of-fit test, one analysis that provides a sense of whether a particular GLM is plausible is the following: Suppose the ML fitted equation were the true equation. At the
Another model (Dobbie and Welsh 2001) for zero-inflated count data uses the Neyman type A distribution, which is a compound Poisson–Poisson mixture. For observation i, let zi denote a Poisson
A headline in The Gainesville Sun (February 17, 2014) proclaimed a worrisome spike in shark attacks in the previous 2 years. The reported total number of shark attacks in Florida per year from 2001
Table 7.5, also available at www.stat.ufl.edu/~aa/glm/data, summarizes responses of 1308 subjects to the question: within the past 12 months, how many people have you known personally that were
For the horseshoe crab data, the negative binomial modeling shown in the R output first treats color as nominal-scale and then in a quantitative manner, with the category numbers as scores. Interpret
For the horseshoe crab data, the following output shows a zero-inflated negative binomial model using quantitative color for the zero component. Interpret results, and compare with the NB2 model
Refer to Section 7.5.2. Redo the zero-inflated NB2 model building, deleting the outlier crab weighing 5.2 kg. Compare results against analyses that used this observation and summarize conclusions.

Showing 200 - 300 of 482