Question

1 Approved Answer

Posted on Sep 05, 2024

only answer 2b 1. Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction.

only answer 2b

image text in transcribed

1. Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide the sample size and the number of variables for each scenario. (a) We are trying to figure out the factors potentially leading to cancer. For this, we collect data on 200 patients that either had or didn't have cancer. In particular, we record their body measurements (weight, height etc), heart rate, blood sugar level, family history of disease - measuring twelve variables. On top of that, we conduct a survey on exercise, eating and drinking habits, adding another three variables. (b) We dig into UH student database and obtain data on 1500 students. We are interested in figuring out the factors affecting the final year GPA of a student depending on the data from the entrance exams, high school and their first year at UH. We extract students' SAT scores, high school GPA, first year GPA at UH, major, age and other five variables. (c) We would like to predict the outcome of a college football game (not the exact score, but simply who wins, team A or team B?) depending on various factors. Assuming that it is mid-season, we obtain the data on all 300 games that have already been played and study such variables as: yards gained/allowed, touch- downs scored/allowed, turnovers committed/forced, whether the game is at home or away, whether it rains or not (all-in-all eight variables). (b) We know that general model formula is Y = f(x) +, (1) and we try to estimate true f with f. For each of examples (a), (b), (c) from Problem 1, proceed to answer the following: i. Can our estimate f be treated as a black box? ii. Why/Why not? 1. Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide the sample size and the number of variables for each scenario. (a) We are trying to figure out the factors potentially leading to cancer. For this, we collect data on 200 patients that either had or didn't have cancer. In particular, we record their body measurements (weight, height etc), heart rate, blood sugar level, family history of disease - measuring twelve variables. On top of that, we conduct a survey on exercise, eating and drinking habits, adding another three variables. (b) We dig into UH student database and obtain data on 1500 students. We are interested in figuring out the factors affecting the final year GPA of a student depending on the data from the entrance exams, high school and their first year at UH. We extract students' SAT scores, high school GPA, first year GPA at UH, major, age and other five variables. (c) We would like to predict the outcome of a college football game (not the exact score, but simply who wins, team A or team B?) depending on various factors. Assuming that it is mid-season, we obtain the data on all 300 games that have already been played and study such variables as: yards gained/allowed, touch- downs scored/allowed, turnovers committed/forced, whether the game is at home or away, whether it rains or not (all-in-all eight variables). (b) We know that general model formula is Y = f(x) +, (1) and we try to estimate true f with f. For each of examples (a), (b), (c) from Problem 1, proceed to answer the following: i. Can our estimate f be treated as a black box? ii. Why/Why not