Need help with these answers, please! Data is in the link below!
Problem 2: CD1 Data: The dataset provides selected county demographic information (CDI) for 440 of the most populous counties in the United States. Each line of the dataset has an identication number with a county name and state abbreviation and provides information on 14 variables for a single county. Variable Number Variable Name Description 1 Identication number 1-440 2 County County name 3 State Two letter state abbreviation 4 Land Area Land area (square miles) 5 Total Population Estimated 1990 population 6 Percentage of Population aged 18-34 7 Percentage of Population aged 65 or older 8 Number of active physicians Number of professionally active nonfederal physicians 9 Number of hospital beds Total number of beds, cribs, bassinets 10 Total serious crimes 11 Percentage high school Percentage of adult population (persons 25 years graduates or older) who completed 12 or more years of school 12 Percentage bachelor's degrees 13 Percentage below poverty level 14 Percentage unemployed 15 Per capita income 16 Total personal income 1'? Geographical regions 1=NE, 2=NC, 3=S, 4=W Refer to the CDI data. The number of active physicians in a CDI (Y) is expected to be related to the total population, number of hospital beds, and total personal income. Assume that a simple regression model is appropriate for each of the three predictor variables. a) Regress the number of active physicians in turn on each of the three predictor variables (one at a time). State the estimated regression function. b) Plot the three estimated regression functions and data on three plots. Does a linear regression relaon appear to provide a good t for each of the three predictor variables? c) Calculate the total sum of square errors, the residual sum of square errors, and the coefcient of determinaon. Which regression equation is the best t