All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Hire a Tutor
AI Study Help
New
Search
Search
Sign In
Register
study help
business
business analytics data
Questions and Answers of
Business Analytics Data
What is the relationship between false positive rate and sensitivity?
Suppose our model has perfect sensitivity and perfect specificity. What then is our accuracy and overall error rate?
Suppose our model has perfect sensitivity. Why is that insufficient for us to conclude that we have a good model?
True or false: If model A has better accuracy than model B, then model A has fewer false negatives than model B. If false, give a counterexample.
What is the relationship between accuracy and overall error rate?
What is the difference between the total predicted negative and the total actually negative?
What is a false positive? A false negative?
Describe the general form of a contingency table.
What might be a drawback of evaluation measures based on squared error? How might we avoid this?
Describe the trade-off between model complexity and prediction error.
How is the square root of the MSE interpreted?
Why do we not use the average deviation as a model evaluation measure?
What is the minimum descriptive length principle, and how does it represent the principle of Occam’s razor?
Why do we need to evaluate our models before model deployment?
Identify the set of outliers in the lower right of the residuals versus fitted values plot. Have we uncovered a natural grouping? Explain how this group would end up in this place in the graph.
Perform the regression of ln pct (ln of percentage over 64) on ln popn, and obtain the regression diagnostics. Explain how taking the ln of percentage over 64 has tamed the residuals versus fitted
Describe the pattern in the plot of the residuals versus the fitted values. What does this mean? Are the assumptions validated?
Describe the pattern in the normal probability plot of the residuals. What does this mean?
Apply the ln transformation to the predictor, giving us the transformed predictor variable ln popn. Note that the application of this transformation is due solely to the skewness inherent in the
Identify the four cities that appear larger than the bulk of the data in the scatter plot.
Construct a scatter plot of percentage over 64 versus popn. Is this graph very helpful in describing the relationship between the variables?
Construct and interpret a 95% confidence interval for the nutrition rating for a randomly chosen cereal with sodium content of 100. Open the California data set (Source: US Census Bureau,
Construct and interpret a 95% confidence interval for the true nutrition rating for all cereals with a sodium content of 100.
What is the typical error in predicting rating based on sodium content?
Construct the graphics for evaluating the regression assumptions. Are they validated?
Put the outlier back in the data set for the rest of the analysis. On the basis of the scatter plot, is there evidence of a linear relationship between the variables? Discuss. Characterize their
Obtain the Cook’s distance value for the outlier. Is it influential?
Using the scatter plot, explain why the y-intercept changed more than the slope when the outlier was omitted.
Omit the outlier. Perform the same regression. Compare the values of the slope and y-intercept for the two regressions.
Perform the appropriate regression.
We are interested in predicting nutrition rating based on sodium content. Construct the appropriate scatter plot. Note that there is an outlier. Identify this outlier. Explain why this cereal is an
Suppose someone said that knowing the number of stolen bases a player has explains most of the variability in the number of times the player gets caught stealing. What would you say? For Exercises
Clearly interpret the meaning of the slope coefficient.
Calculate and interpret the correlation coefficient.
Inferentially, is there a significant relationship between the two variables? What tells you this?
Interpret the y-intercept. Does this make any sense? Why or why not?
What is the typical error in predicting the number of times a player is caught stealing, given his number of stolen bases?
Find and interpret the statistic that tells you how well the data fit the model.
Perform the regression of the number of times a player has been caught stealing versus the number of stolen bases the player has.
On the basis of the scatter plot, is a transformation to linearity called for? Why or why not?
We are interested in investigating whether there is a linear relationship between the number of times a player has been caught stealing and the number of stolen bases the player has. Construct a
List the influential observations, according to Cook’s distance and the F criterion. Next, subset the Baseball data set so that we are working with batters who have at least 100 at bats. Use this
List the high leverage points. Why is Greg Vaughn a high leverage point? Why is Bernie Williams a high leverage point?
List the outliers. What do all these outliers have in common? For Orlando Palmeiro, explain why he is an outlier.
Construct and interpret a 95% prediction interval for a randomly chosen player with a 0.300 batting average. Is this prediction interval useful?
Construct and interpret a 95% confidence interval for the mean number of home runs for all players who had a batting average of 0.300.
Calculate the correlation coefficient. Construct a 95% confidence interval for the population correlation coefficient. Interpret the result.
Construct and interpret a 95% confidence interval for the unknown true slope of the regression line.
Perform the hypothesis test for determining whether a linear relationship exists between the variables.
What percentage of the variability in the ln home runs does batting average account for?
What is the size of the typical error in predicting the number of home runs, based on the player’s batting average?
Estimate the number of home runs (not ln home runs) for a player with a batting average of 0.300.
State the regression equation (from the regression results) in words and numbers.
Write the population regression equation for our model. Interpret the meaning of ????0 and ????1.
Construct a plot of the residuals versus the fitted values. Do you see strong evidence that the constant variance assumption has been violated? (Remember to avoid the Rorschach effect.) Therefore
Take the natural log of home runs, and perform a regression of ln home runs on batting average. Obtain a normal probability plot of the standardized residuals from this regression. Does the normal
Construct a plot of the residuals versus the fitted values (fitted values refers to the y’s). What pattern do you see? What does this indicate regarding the regression assumptions?
Perform a regression of home runs on batting average. Obtain a normal probability plot of the standardized residuals from this regression. Does the normal probability plot indicate acceptable
Refer to the previous exercise. Which regression assumption might this presage difficulty for?
What would you say about the variability of the number of home runs, for those with higher batting averages?
Informally, is there evidence of a relationship between the variables?
Construct a scatter plot of home runs versus batting average.
Compare your results for the hypothesis test and the confidence interval. Comment.
Assume normality. Construct a 90% confidence interval for the population correlation coefficient. Interpret the result.
Calculate the correlation coefficient r.
Suppose we let ???? = 0.10. Perform the hypothesis test to determine if a linear relationship exists between x and y. Assume the assumptions are met.
Interpret the value of the standard error of the estimate, s.
Interpret the value of the slope b1.
Interpret the value of the y-intercept b0.
Carefully state the regression equation, using words and numbers.
As it has been standardized, the response z vmail messages has a standard deviation of 1.0. What would be the typical error in predicting z vmail messages if we simply used the sample mean response
Discuss the usefulness of the regression of z mail messages on z day calls.
Assuming normality, construct and interpret a 95% confidence interval for the population correlation coefficient.
Use the data in the ANOVA table to find or calculate the following quantities: 20,000 19,000 18,000 17,000 16,000 15,000 14,000 13,000 12,000 11,000 Scatterplot of attendance vs winning percent
Is there evidence of a linear relationship between z vmail messages (z-scores of the number of voice mail messages) and z day calls (z-scores of the number of day calls made)? Explain.
What type of transformation or transformations is called for? Use the bulging rule. For Exercises 26–30, use the output from the regression of z mail messages on z day calls (from the Churn data
Is it appropriate to perform linear regression? Why or why not?
Is there an observation that may look as though it is an outlier? Explain. For Exercises 24 and 25, use the scatter plot in Figure 8.23 to answer the questions.
Will the value of s be closer to 10, 100, 1000, or 10,000? Why?
Will the confidence interval for the slope parameter include zero or not? Explain.
Will the p-value for the hypothesis test for the existence of a linear relationship between the variables be small or large? Explain.
Estimate as best you can the values of the regression coefficients b0 and b1.
Describe any correlation between the variables. Interpret this correlation.
A colleague would like to use linear regression to predict whether or not customers will make a purchase, based on some predictor variable. What would you explain to your colleague?
What recourse do we have if the residual analysis indicates that the regression assumptions have been violated? Describe three different rules, heuristics, or family of functions that will help us.
Clearly explain the correspondence between an original scatter plot of the data and a plot of the residuals versus fitted values.
Explain the difference between a confidence interval and a prediction interval. Which interval is always wider? Why? Which interval is probably, depending on the situation, more useful to the data
(a) Explain why an analyst may prefer a confidence interval to a hypothesis test. (b) Describe how a confidence interval may be used to assess significance.
Describe the criterion for rejecting the null hypothesis when using the p-value method for hypothesis testing. Who chooses the value of the level of significance, ????? Make up a situation (one
Explain what information is conveyed by the value of the standard error of the slope estimate.
Which values of the slope parameter indicate that no linear relationship exist between the predictor and response variables? Explain how this works.
Explain what statistics from Table 8.11 indicate to us that there may indeed be a linear relationship between x and y in this example, even though the value for r2 is less than 1%.
Explain in your own words the implications of the regression assumptions for the behavior of the response variable y.
Match each of the following regression terms with its definition. Regression Term Definitiona. Influential observation Measures the typical difference between the predicted response value and the
Calculate the values for leverage, standardized residual, and Cook’s distance for the 11th hiker who had hiked for 10 hours and traveled 23 kilometers. Show that, while it is neither an outlier nor
Calculate the values for leverage, standardized residual, and Cook’s distance for the hard-core hiker example in the text.
Where would a data point be situated that has the smallest possible leverage?
Calculate the estimated regression equation for the orienteering example, using the data in Table 8.3. Use either the formulas or software of your choice.
Describe the difference between the estimated regression line and the true regression line.
Indicate whether the following statements are true or false. If false, alter the statement to make it true.a. The least-squares line is that line that minimizes the sum of the residuals.b. If all the
Showing 1100 - 1200
of 2834
First
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Last