Question
Data Exploration and Multiple Linear Regression (MLR) using SAS. The College data set contains the statistics for many US Colleges from 1995 issue of US
Data Exploration and Multiple Linear Regression (MLR) using SAS. The "College" data set contains the statistics for many US Colleges from 1995 issue of US News and World Report. It has 777 observations on 18 variables. The colleges want to predict the enrollment of the student for the next semester based on the past data available. For a description of the data see "College.txt" in Canvas which contains college data and attribute information. The main task is to check if the number of enrollments is dependent on the characteristics of the university. "Private" is the dummy variable. Do the dummy coding accordingly (See the "Regression with Dummy Variables in SAS.docx in Canvas) 1. Generate boxplots of the accept (Number of applications accepted), top10perc (% of new students from top 10% of High School class) attributes and the dependent variable enroll (Number of new students enrolled) and identify/remove the cutoff values for outliers. 2. Try to fit an MLR to this dataset, with ENROLL as the dependent variable. P_UNDERGRAD has somewhat longish tail, so we will take a log transform, (use LP_UNDERGRAD= log(P_UNDERGRAD)) and then use LP_UNDERGRAD as one of predictor. Keep the first 544 records as a training set (call it ENROLLTRAIN) which you will use to fit the model; the remaining 233 will be used as a test set (ENROLLTEST). 3. Use only the following variables in your model: ENROLL=ACCEPT+TOP10PERC+F_UNDERGRAD+LP_UNDERGRADE+ROOM_BOARD +GRADE_RATE +PRIVATEDUMMY (a) Report the coefficients obtained by your model. Would you drop any of the variables used in your model (based on the t-scores or p-values)? (b) Report the MSE obtained on ENROLLTRAIN. How much does this increase when you score your model on ENROLLTEST? (c) (Bonus 2 points). Do you think your MLR model is reasonable for this problem? You may look at the distribution of residuals to provide an informed answer.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
To address your task of exploring the College dataset performing multiple linear regression MLR in SAS and evaluating the models performance lets brea...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started