Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

fData Exploration and Multiple Linear Regression (MLR) using R1 ThetBostontHousingtdatatset,tparttoftthetMASStpackage,trecordstpropertiestoft506thousingtzonestin thetGreatertBostontarea.tFortatdescriptiontoftthetdatat(housingtdatatandtattributetinformation),tvisit https://archive.ics.uci.edu/ml/datasets/Housing. TypicallytonetistinterestedtintpredictingtMEDVt(median hometvalue)tbasedtontothertattributes. 1.tGeneratetbox-plotstoftthetLSTATt(%toftlowertstatustintthetpopulation)tandtMEDVt(medianthome value)tattributestandtidentifytthetcutofftvaluestfortoutliers.tGeneratetatscatterplottoftMEDVtagainst LSTAT;tcommenttonthowtinclusiontoftthetoutlierstwouldtaffecttatpredictivetmodeltoftmedianthometvalue astatfunctiontoft%toftlowertstatustintthetpopulation. 2.tTryttotfittantMLRttotthistdataset,twithtMEDVtastthetdependenttvariable.tMEDVthastatsomewhat longishttailtandtistnottsotGaussian-like,tsotwetwillttaketatlogttransform,t(usetLMEDVt=t-tlog(MEDV)t),tand thentpredicttLMDEVtinstead.t(Youtshouldtconvincetyourselftthattthististatbettertideatbytlookingtattthe histogramstandtquantiletplotsttotassesstnormality;thowevertnotneedttotsubmittsuchtplots).tKeeptthe firrstt300trecordstastattrainingtsett(calltittBostrain)twhichtyoutwilltusettotfittthetmodel;tthetremainingt206 willtbetusedtastattesttsett(Bostest).tUsetonlytthetfollowingtvariablestintyourtmodel: LMEDVt=tLSTATt+tRMt+tCRIMt+tZNt+tCHAS.

\fData Exploration and Multiple Linear Regression (MLR) using R1 The\t\"Boston\tHousing"\tdata\tset,\tpart\tof\tthe\tMASS\tpackage,\trecords\tproperties\tof\t506\thousing\tzones\tin the\tGreater\tBoston\tarea.\tFor\ta\tdescription\tof\tthe\tdata\t(housing\tdata\tand\tattribute\tinformation),\tvisit https://archive.ics.uci.edu/ml/datasets/Housing. Typically\tone\tis\tinterested\tin\tpredicting\tMEDV\t(median home\tvalue)\tbased\ton\tother\tattributes. 1.\tGenerate\tbox-plots\tof\tthe\tLSTAT\t(%\tof\tlower\tstatus\tin\tthe\tpopulation)\tand\tMEDV\t(median\thome value)\tattributes\tand\tidentify\tthe\tcutoff\tvalues\tfor\toutliers.\tGenerate\ta\tscatterplot\tof\tMEDV\tagainst LSTAT;\tcomment\ton\thow\tinclusion\tof\tthe\toutliers\twould\taffect\ta\tpredictive\tmodel\tof\tmedian\thome\tvalue as\ta\tfunction\tof\t%\tof\tlower\tstatus\tin\tthe\tpopulation. 2.\tTry\tto\tfit\tan\tMLR\tto\tthis\tdataset,\twith\tMEDV\tas\tthe\tdependent\tvariable.\tMEDV\thas\ta\tsomewhat longish\ttail\tand\tis\tnot\tso\tGaussian-like,\tso\twe\twill\ttake\ta\tlog\ttransform,\t(use\tLMEDV\t=\t-\tlog(MEDV)\t),\tand then\tpredict\tLMDEV\tinstead.\t(You\tshould\tconvince\tyourself\tthat\tthis\tis\ta\tbetter\tidea\tby\tlooking\tat\tthe histograms\tand\tquantile\tplots\tto\tassess\tnormality;\thowever\tno\tneed\tto\tsubmit\tsuch\tplots).\tKeep\tthe firrst\t300\trecords\tas\ta\ttraining\tset\t(call\tit\tBostrain)\twhich\tyou\twill\tuse\tto\tfit\tthe\tmodel;\tthe\tremaining\t206 will\tbe\tused\tas\ta\ttest\tset\t(Bostest).\tUse\tonly\tthe\tfollowing\tvariables\tin\tyour\tmodel: LMEDV\t=\tLSTAT\t+\tRM\t+\tCRIM\t+\tZN\t+\tCHAS. 3.\tReport\tthe\tcoefficients\tobtained\tby\tyour\tmodel.\tWould\tyou\tdrop\tany\tof\tthe\tvariables\tused\tin\tyour model\t(based\ton\tthe\tt-scores\tor\tp-values)? 4.\tReport\tthe\tMSE\tobtained\ton\tBostrain.\tHow\tmuch\tdoes\tthis\tincrease\twhen\tyou\tscore\tyour\tmodel\t(i.e., predict)\ton\tBostest? 5.\t(Bonus\t1\tpoint).\tUse\tthe\tstepwise\tregression\tto\treach\tyour\tfinal\tmodel.\tTry\tdifferent\tmodel\tsection criteria\t(i.e.,\tAIC,\tCp,\tBIC,\tadj\tR^2,\tR^2)\tand\tsee\tif\tyou\tcan\tcome\tup\twith\tthe\tsame\tmodel\teven\twith\tthe different\tcriteria. Determine\tthe\tbest\tmodel\tif\tyou\tget\tdifferent\tmodels\twith\tdifferent\tcriteria? We\twill consider\ta\tmodel\tthat\tgives\tthe\thighest\taccuracy\t(in\tterms\tof\tMSE)\tin\tthe\ttest\tset\tas\tthe\tbest\tmodel. 1 You\tmust\tuse\tR\tto\trun\tregression\talthough\tuse\tof\tother\tsoftware\tis\talso\tencouraged\tfor\tverification\tof\tyour answers. Data Exploration and Multiple Linear Regression (MLR) using SAS The \"Boston Housing" data set, part of the MASS package, records properties of 506 housing zones in the Greater Boston area. For a description of the data, see Moodle2 (housing data and attribute information). Typically one is interested in predicting MEDV (median home value) based on other attributes. (a) Generate box-plots of the LSTAT (% of lower status in the population) and MEDV (median home value) attributes and identify the cutoff values for outliers. Generate a scatterplot of MEDV against LSTAT; comment on how inclusion of the outliers would affect a predictive model of median home value as a function of % of lower status in the population. (Hint: Such effects may be easier to visualize if the outliers are a different symbol than the other data.) (b) Try to fit an MLR to this dataset, with MEDV as the dependent variable. MEDV has a somewhat longish tail and is not so Gaussian-like, so we will take a log transform, (use LMEDV = - log(MEDV) ), and then predict LMDEV instead. (You should convince yourself that this is a better idea by looking at the histograms and quantile plots to assess normality; however no need to submit such plots). Keep the firrst 300 records as a training set (call it Bostrain) which you will use to fit the model; the remaining 206 will be used as a test set (Bostest). Use only the following variables in your model: LMEDV = LSTAT + RM + CRIM + ZN + CHAS. (a) Report the coefficients obtained by your model. Would you drop any of the variables used in your model (based on the t-scores or p-values)? (b) Report the MSE obtained on Bostrain. How much does this increase when you score your model on Bostest? (c) (Bonus 2 points). Do you think your MLR model is reasonable for this problem? You may look at the distribution of residuals to provide an informed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Concise Pre Algebra Workbook

Authors: Josiah Coates

1st Edition

1724185152, 978-1724185150

More Books

Students also viewed these Mathematics questions

Question

What is cost plus pricing ?

Answered: 1 week ago

Question

1. What are the types of wastes that reach water bodies ?

Answered: 1 week ago

Question

Which type of soil has more ability to absorb water?

Answered: 1 week ago