All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Hire a Tutor
AI Study Help
New
Search
Search
Sign In
Register
study help
business
business analytics data
Questions and Answers of
Business Analytics Data
2.29 For a 2 × 2 table of counts {nij}, show that the odds ratio is invariant to (a) inter-changing rows with columns, and (b) multiplication of cell counts within rows or within columns by c≠ 0.
2.30 For given π₁ and π₂, show that the relative risk cannot be farther than the odds ratio from their independence value of 1.0.
2.31 Let πijk = P(X = i, Y = j|Z = k). Explain why XY conditional independence isπijk = πi+|k π+j|k for all i and j and k.
2.32 For a 2 x 2 x 2 table, show that homogeneous association is a symmetric property, by showing that equal XY conditional odds ratios is equivalent to equal YZ conditional odds ratios.
2.33 For a 2 x 2 x 2 table, suppose θXY(1) = θXY(2) = θ. For a possibly confounding vari-able Z, let θc denote the common value of θ(i)YZ. Let π1 = P(Z = 1|X = 1, Y = 2)and π2 = P(Z = 1|X = 2,
2.34 When X and Y are conditionally dependent at each level of Z yet marginally inde-pendent, Z is called a suppressor variable. Specify joint probabilities for a 2 x 2x2 table to show that this can
2.35 Show that the {aij} in (2.11) determine all odds ratios formed from pairs of rows and pairs of columns.
2.36 For I × J contingency tables, explain why the variables are independent when the (I - 1)(J-1) differences πji - πji = 0, i = 1, ..., I – 1, j = 1, ..., J-1.
2.38 For 2 x 2 tables, Yule (1900, 1912) introduced Q =π11 π22 - π12 π21π11 π22 + π12 π21, which he labeled Q in honor of the Belgian statistician Quetelet. It is now called Yule's Q.a. Show
2.39 Goodman and Kruskal (1954) proposed an association measure (tau) for nominal variables based on variation measure V (Y) = Σπ+j(1 − π+j) = 1 − Σπ2+j.a. Show that V(Y) is the probability
2.40 The measure of association lambda for nominal variables (Goodman and Kruskal 1954) has V (Y) = 1 - max{π+j} and V(Y\i) = 1 - maxj{πj|i }. Interpret lambda as a proportional reduction in error
2.41 Show that Δ in (2.15) relates to a = P(Y1 > Y2) + (+)P(Y1 = Y2) byα = (Δ + 1)/2, Δ = 2α – 1, with α having range [0, 1] and null value 1⁄2.
3.1 A meta-analysis (Moore et al., Lancet 370: 319-328, 2007) of studies on the asso-ciation between cannabis use (yes, no) and presence of psychosis (yes, no) reported a pooled odds ratio estimate
3.2 For 239 golf tournaments on the PGA tour between 2004 and 2009, the economists D. Pope and M. Schweitzer evaluated risk aversion by comparing percentages of putts made when putting for a par
3.3 Table 3.10 uses the GSS to cross-classify a subject's political party ID with their opinion about whether homosexuals should have the right to marry, for subjects hav-ing strong identification
3.4 For Table 2.10 on seat-belt use and results of auto accidents, find and interpret 95%confidence intervals for the conceptual population (a) odds ratio, (b) difference of proportions, and (c)
3.5 Refer to Table 2.5 on lung cancer and smoking. Conduct an inferential analysis, and interpret results.
3.6 A study considered the effect of prednisolone on severe hypercalcemia in women with metastatic breast cancer (B. Kristensen et al., J. Intern. Med. 232: 237-245, 1992). Of 30 patients, 15 were
3.7 In professional basketball games during 2009-2010, when Kobe Bryant of the Los Angeles Lakers shot a pair of free throws, 8 times he missed both, 152 times he made both, 33 times he made only the
3.8 Refer to Exercise 3.3 and Table 3.10.a. Find the z statistic (3.12) and explain how it relates to a chi-squared test.b. Find a score or profile likelihood confidence interval for the odds ratio,
3.9 Go to sda.berkeley.edu/GSS and download a contingency table relating attained education and the fundamentalism of one's religious beliefs, for the most recent survey. The GSS variable names are
3.10 As in the previous exercise, download recent GSS data and perform analyses to answer the questions asked.a. Are people happier who believe in life after death? Analyze using the GSS variables
3.11 Refer to Table 3.11, GSS data on party ID and race.a. Using X2 and G2, test the hypothesis of independence between party identifica-tion and race. Report the P-values and interpret.b. Use
3.12 Using the 2008 GSS, we cross-classified party ID with gender. Table 3.12 shows some results. Explain how to interpret all the results on this printout. (Reschi denotes the Pearson residual and
3.13 A recent study (by R. Armenio et al., J. Am. Dent. Assoc. 139: 592-597, 2008) re-ported results of a double-blind randomized clinical trial comparing tooth sensitivity for 14 patients using a
3.14 Table 3.13 classifies a sample of psychiatric patients by their diagnosis and by whether their treatment prescribed drugs. Partition chi-squared into three compo-nents to describe differences
3.15 A GSS that cross-classified income in thousands of dollars (
3.16 A study on educational aspirations of high school students (S. Crysdale, Int. J.Compar. Sociol. 16: 19-36, 1975) measured aspirations with the scale (some high school, high school graduate, some
3.17 Refer to Table 2.13 on homosexual sex and premarital sex.a. Construct and interpret a mosaic plot.b. Obtain a 95% confidence interval for gamma. Interpret the association.
3.18 Table 3.14 shows the results of a retrospective study comparing radiation therapy with surgery in treating cancer of the larynx. The response indicates whether the cancer was controlled for at
3.19 A study in the Department of Wildlife Ecology at the University of Florida sampled wild common carp fish from a wetland in central Chile. One analysis investigated whether the fish muscle had
3.20 Seneta and Phipps (2001) described a medical study that compared subjects with nonacute appendicitis and with acute appendicitis in terms of whether they suffered severe right abdominal pain.
3.21 Analyze Table 3.1 using the Bayesian approach with independent uniform prior distributions.a. Specify the posterior distribution of (π₁, π₂).b. Using software or your own simulation,
3.22 Refer to the table (11, 0/0, 1) analyzed with Bayesian methods in Section 3.6.4.Using simulation, estimate P(π₁ > π₂|y₁, n₁; y₂, n₂) for independent beta(α₁ α₂)priors having
3.23 Table 3.16 cross-classifies votes in the 2000 and 2004 U.S. presidential elections.Treating the two rows as independent binomials and using uniform priors, generate the posterior distribution of
3.24 Is θ the midpoint of commonly used confidence intervals for the odds ratio θ? Why or why not?
3.25 For comparing two binomial samples with fixed sample sizes, show that the stan-dard error (3.1) of a log odds ratio increases when, for either sample, the absolute difference of proportions of
3.26 Using the delta method as in Section 3.1.6, show that the Wald confidence interval for the logit of a binomial parameter is log[/(1 - ㄤ)] ± Ζα/2/√ηπ (1 – π).Explain how to use this
3.27 For two parameters, a confidence interval for 01 - 02 based on single-sample estimate and interval (l₁, u₁) for θ₁, i = 1, 2, is(01-02 - √(01-11)² + (12-02)², 01 - 02 + √(41 –
3.28 For multinomial sampling, use the asymptotic variance of log to show that for Yule's Q (Exercise 2.38) the asymptotic variance of √n(Q-Q) is(Σ; Σπ¹) (1 – Q²)2/4 (Yule 1900, 1912).
3.29 For multinomial probabilities π = (πι, π2, ...) with a contingency table of ar-bitrary dimensions, consider a measure of form g(π) = ν/δ. Show that the asymptotic variance of √n[g(t) -
3.30 Show that x2 = n Σ; Σ; (Pij - Pi+P+j)²/Pi+P+j = n Σ₁ Σ; Pi+P+j (aij - 1)2 for the sample association factors {aij}. Thus, X2 can be large when n is large, regardless of whether the
3.31 For a 2 x 2 table, consider H0: πιι = 02, πι2 = π21 = θ(1 – θ), π22 = (1 – θ)2.a. Show that the marginal distributions are identical and that independence holds.b. For a
3.32 For testing independence, show that X2 ≤ n min(I – 1, J-1). Hence V2 =X2/[n min(I - 1, J - 1)] falls between 0 and 1 (Cramér 1946). [For 2 x 2 tables, X2/n is often called phi-squared; it
3.33 For a 1 x 2 table (i.e., a single binomial Y based on n trials, with probabilities π and 1 – π), consider testing Ηο: π = πο.a. Show that the Pearson residuals are(γ –
3.34 For a 2 x 2 table, show that:a. The four Pearson residuals may take different values.b. All four standardized residuals have the same absolute value. (This is sensible, since df = 1.)c. The
3.35 Use a partitioning argument to explain why G2 for testing independence cannot increase after combining two rows (or two columns) of a contingency table.[Hint: Explain why G2 for full table = G2
3.36 Assume independence, and let pij = nij/n and tij = Pi+P+j•a. Show that pij and tij are unbiased for πij = πi+N+j.b. Show that var(pij) = πί+ π+j(1 – πί + π+j)/n.c. Using E(pi+P+j)²
3.37 Consider an I × J table with ordered columns and unordered rows. Ridits (Bross 1958) are data-based column scores. The jth sample ridit is the average cumulative proportion within category j,
3.38 Show that the sample value of the uncertainty coefficient (2.13) satisfies-G2/2n (p+; log p+j). [Haberman (1982) gave its standard error.]
3.39 Of six candidates for three managerial positions, denote the females by F1, F2, F3 and the males by M1, M2, M3.a. Identify the 20 possible combinations of candidates that could be selected.
3.40 When a test statistic has a continuous distribution, the P-value has a null uniform dis-tribution, P(P-value < α) = a for 0 < a < 1. For Fisher's exact test, explain why P(P-value < α) < α.
3.41 Note 3.3 showed moments of the hypergeometric distribution (3.17). Letting p =n+1/n, show that n₁₁ has the same mean as a binomial random variable for n1+trials with success probability p,
3.42 For the tea-tasting data (Table 3.9), construct the null distributions of the ordinary P-value and the mid P-value for Fisher's exact test with Ha: 0 > 1. Find and compare their expected values.
3.43 In Section 3.5.6 we analyzed a 2 x 2 table having entries (3, 0/0, 3). Explain why the unconditional P-value, evaluated at π = 0.50, is related to Fisher conditional P-values for various tables
3.44 For testing Ηο: πι = π₂ with two binomial variates y₁ and y2, a "reasonable" test would not reject Ho if y₁ = y2 = 0. Since as π₁ and π₂ approach 0, the probability of this
3.45 For independent uniform prior distributions for two binomial parameters, show that r = πι/π2 has prior density g(r) = ½ for 0 ≤ r ≤ 1 and g(r) = 1/2r2 for r > 1.
3.46 Explain why a Bayesian HPD interval is sensible for πι π₂ but not usually forπι/π2.
3.47 Consider a particular choice of Dirichlet means {Yij = E(π¡j) = α¡j/K} for the Bayes estimator (1.19) extended to two-way tables. Show that the total mean squared error is 2[K/(n + K)]²
For each of the predictors, find the log posterior odds ratio, and explain the contribution of this predictor to the probability of a malignant tumor.
Find the naïve Bayes classifications for each of the combinations in Exercise 31.
(Optional) Assess the validity of the conditional independence assumption, using calculations similarly to Table 14.5.
For each of the combinations in the previous exercise, find the posterior odds ratio.
Using your results from the previous exercise, find the maximum a posteriori classification of tumor class, for each of the following combinations:a. Mitoses=low and Clump Thickness=low.b.
Construct the joint conditional probabilities, similarly to Table 14.4.
Find the posterior probability that the tumor is malignant, given that clump thickness is (i) high and (ii) low.
Find the posterior probability that the tumor is malignant, given that mitoses is (i) high and (ii) low.
Find the conditional probabilities for each of the predictors, given that the tumor is malignant. Then find the conditional probabilities for each of the predictors, given that the tumor is benign.
Find the prior probabilities for each of the predictors and the target variable. Find the complement probabilities of each.
Consider using only two predictors, mitoses and clump thickness, to predict tumor class. Categorize the values for mitoses as follows: Low =1 and High=2–10. Categorize the values for clump
Compute the probabilities by which the Bayes net model classifies the fourth instance from the test file movies_test.arff. Do your calculations result in a positive classification as reported by WEKA?
Revisit the WEKA naïve Bayes example. Calculate the probability that the first instance in movies_test.arff is “pos” and “neg.” Do your calculations agree with those reported by WEKA leading
Provide the MAP classification for season given that a warm coat was purchased, in the clothing purchase example in the Bayesian network section.
Find the naïve Bayes classifier for the following customers. Use the empirical distribution where necessary.a. Belongs to neither plan, with 400 day minutes.b. Belongs to the International Plan
Verify the empirical distribution results referred to in the text, of the numbers of records within the certain margins of error of 800 minutes, for each of churners and non-churners.
Calculate the naïve Bayes classification for all four possible combinations of International Plan and Voice Mail Plan membership, using the 25.31% 74.69% balancing.
Compute the posterior odds ratio for each of the combinations of International Plan and Voice Mail Plan membership, using the balanced data set.
What are the two main considerations when building a Bayesian network?
Describe the intrinsic relationship among the variables in a Bayesian network.
Explain the difference in assumptions between naïve Bayes classification and Bayesian networks.
Explain what is meant by working with the empirical distribution. Describe how this can be used to estimate the true probabilities.
Extra credit: Investigate the mixture idea for the continuous predictor mentioned in the text.
Describe the process for using continuous predictors in Bayesian classification, using the concept of distribution.
Explain why the log posterior odds ratio is useful. Provide an example.
When is the naïve Bayes classification the same as the MAP classification? What does this mean for the naïve Bayes classifier, in terms of optimality?
What is meant by conditional independence? Provide an example of events that are conditionally independent. Now provide an example of events that are not conditionally independent.
Explain why the MAP classification is impractical to apply directly for any interesting real-world data mining application.
Explain why we cannot avoid altering, even slightly, the character of the data set, when we apply balancing.
Describe what balancing is, and when and why it may be needed. Also, describe two techniques for achieving a balanced data set, and explain why one method is preferred.
Explain the interpretation of the posterior odds ratio. Also, why do we need it?
Explain in plain English what is meant by the maximum a posteriori classification.
Why would we expect, in most data mining applications, the maximum a posteriori estimate to be close to the maximum-likelihood estimate?
Explain the difference between the prior and posterior distributions.
Describe the differences between the frequentist and Bayesian approaches to probability.
Use cluster membership to predict rating. One way to do this would be to construct a histogram of rating based on cluster membership alone. Describe how the relationship you uncovered makes sense,
Which clustering solution do you prefer, and why?
Rerun the k-means algorithm with k=3.
Develop clustering profiles that clearly describe the characteristics of the cereals within the cluster.
Using all of the variables, except name and rating, run the k-means algorithm with k=5 to identify clusters within the data.
Confirm the calculations for the second pass and third pass for MSB, MSE, and pseudo-F for step 4 of the example given in the chapter.
Showing 700 - 800
of 2834
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Last