Question
Simple regression Toexplainthevalueempiricallyofmultipleregressionversussimplelinearregression,let'scontinuewiththisexamplethatwehaveinSPSSoftheinternetdata.Remember,ouranalystwantedtobelievethattherewasastrongrelationshipbetweeninternetusageandincome,butfoundthatthedatamaskedthatrelationshipuntiltheadditionofageasavariable. Let'sseehowthatworksasweworkthroughamodelinSPSSthatfirstonlybeginswiththesimplelinearregressionmodelofpredictinginternetusagewithincomeonlyandthenseewhathappenswhenweaddagetotheregressionmodel.Solet'sbeginfirstwithoursimpleregressionofthedata. We'regoingtohaveasourdependentvariable,internetusage;andasourindependentvariable,orx,income.Andifwecalculatethesimpleregressionmodel,thefirstthingwehavetodoobviouslyandalwaysistocheckourassumptions. SoinSPSSIhadthedatacalculatedasaresidualplotsothatwe couldtakealookandseewhetherornottherewasanydiscerniblepatternstothedataaccordingtotheresidualplot.Andaswelookatthisresidualplottocheckourassumptions,oneofthefirstthingsthatwefindisthatthefunctionalspecificationorlinearityseemstobeOK. Letmebackupforjustasecondandshowtheresidualplotagain.Idon'tseeanycurvilinearpatterntothedata.Idon'tseeanyslopetothedata.Itjustappearstobeafairlywell-distributedblobofresidualsacrosstheplot.Soitdoesn'tlooklikethere'sanyproblemwithlinearityanditdoesn'tlooklikethere'sanyproblemwithuncorrelatederrorterms. SothatseemstobeOKintermsoffunctionalspecification.Andalsonowlet'schecktheconstancyofvariance.Again,nofunnelshapetotheresiduals,everythingseemstobefairlyspreadoutoverthedataspacethereor theplot space. Sothere'snorealfunneltothedatathatindicatesincreasingvarianceintheresidualsaswemoveawayfromtheverticalaxis. SoitlooksliketheconstancyoferrortermsisalsoOK.AndasIsaid,beforetheuncorrelatederrorterms,becausetheredoesn'tappeartobeanyparticularpatterntotheresiduals,alsoseemstobemet. Nowinordertochecknormality,let'stakealookattheK-Stest.TheK-Stest,asdoallstatisticaltestsinSPSS,comeswithap-value.Andthatp-valueisalwaysgivenintheroworcolumnthat'sindicatedwithSig,justasitishere.Soyouseethetwo-tailedtestofsignificancecomesbackwithpisequalto0.200.Well,thatobviouslyisgreaterthan0.5soweacceptthenullthatthedataarenormallydistributedorthatthereisnodeparturefromnormality. Sowe'regoodthereandwedon'tevenneedtolookataq-qplotinordertodiscerntheideaofnormalityfromtheresiduals.Thisisnotgoingtohappenawholelot.MostofthetimewhenyourunaK-Stest,yourdataaregoingtocomebackwithaprettylowp-value,suggestingthatyouaregoingtohavetomakeaq-qplot.Butinthiscase,wegotluckyandwecanfinishupthetestingofassumptionsthenwiththeK-Stest. Sooverall,assumptionslookgood.Itappearstobealinearmodelwithconstancyofvariance,uncorrelatedandnormallydistributederrorterms.Sowe'regoodforgoingaheadwithlookingatthedataanalysis.Solet'stakealookatourregressionresults. WhenIlookatregressionresultsinSPSS,thereareanumberoftablesthattheoutputprovides,includingbeginningwiththemodelsummary.AsyoulookattheRstatisticthat'sgiven,doesthatnumber,0.42,ringabell?RememberwhenwecalculatedthePearsoncorrelationcoefficientforthedata?Italsowas0.42. That'sbecauseRinthismodelsummaryandRbeingtheabbreviationforcorrelation,isthecorrelationbetweenthosetwovariables.Itonlycomesoutthiswaywithsimpleregression.ButIwanttoshowyousomeofthesimilaritiesbetweentheinformationthatcorrelationprovidesandtheinformationthatasimplelinearregressionprovides.AndthatRfigureof0.42isexactlythat. RsquareistheoverallmeasureofmodelfitthatI'musuallymostinterestedin.AndthatisthesquareofRwhichcomesupbeing0.002,whichmeansverylittlevarianceinthedependentvariableisbeingexplainedbyourmodel,andthatalmostallofourvariancehappenstoberandomvariance.Sojustatinybitofvarianceexplained,asshownbythevery,verylowRsquarefigure. Andjusttocarryontheanalysis,we'lltakealookattheindividualhypothesistestsandwe'llfindthattheoverallSigvalue,theS-I-Gvaluelocatedon theright-handsideofthattable,isprettylarge.It's0.611.Well,doyourememberthatwhenwelookedatthePearsoncorrelationcoefficient,thep-valueofthatparticulartestwasalso0.611. Sosimplelinearregressionandcorrelationareshowinguslargelythesameinformationhereandthestorythatit'stellingisthestorywe'veknownallalong.Andthatistheindependentvariabledoesnotdoaverygoodjobofexplainingorpredictingdependentvariables.Inotherwords,thedependentvariableofinternetusageisnotreallystronglyassociatedbyasimplelookatthedatawithincome.Butasweknow,there'smoretothestory. Solet'sconsideramultipleregressionmodelwhereweaddagetotheregressionequation.Sothatycontinuestobeinternetusage.Ourfirstvariable,independent-wise,isincomeasitalwayshasbeen.Butnowwe'regoingtoaddasecondindependentvariableofage. SonowwhatIwouldlikeyoutodoisthinkaboutwhatthoseregressionoutputsaregoingtolooklike.What'sgoingtohappentoRsquare?What'sgoingtohappentoouroverallmodelsignificance?Well,we'llfindoutwhenwecomebackinthenextsegment. Given
Simple regression
Toexplainthevalueempiricallyofmultipleregressionversussimplelinearregression,let'scontinuewiththisexamplethatwehaveinSPSSoftheinternetdata.Remember,ouranalystwantedtobelievethattherewasastrongrelationshipbetweeninternetusageandincome,butfoundthatthedatamaskedthatrelationshipuntiltheadditionofageasavariable.
Let'sseehowthatworksasweworkthroughamodelinSPSSthatfirstonlybeginswiththesimplelinearregressionmodelofpredictinginternetusagewithincomeonlyandthenseewhathappenswhenweaddagetotheregressionmodel.Solet'sbeginfirstwithoursimpleregressionofthedata.
We'regoingtohaveasourdependentvariable,internetusage;andasourindependentvariable,orx,income.Andifwecalculatethesimpleregressionmodel,thefirstthingwehavetodoobviouslyandalwaysistocheckourassumptions.
SoinSPSSIhadthedatacalculatedasaresidualplotsothatwe couldtakealookandseewhetherornottherewasanydiscerniblepatternstothedataaccordingtotheresidualplot.Andaswelookatthisresidualplottocheckourassumptions,oneofthefirstthingsthatwefindisthatthefunctionalspecificationorlinearityseemstobeOK.
Letmebackupforjustasecondandshowtheresidualplotagain.Idon'tseeanycurvilinearpatterntothedata.Idon'tseeanyslopetothedata.Itjustappearstobeafairlywell-distributedblobofresidualsacrosstheplot.Soitdoesn'tlooklikethere'sanyproblemwithlinearityanditdoesn'tlooklikethere'sanyproblemwithuncorrelatederrorterms.
SothatseemstobeOKintermsoffunctionalspecification.Andalsonowlet'schecktheconstancyofvariance.Again,nofunnelshapetotheresiduals,everythingseemstobefairlyspreadoutoverthedataspacethereor theplotspace.Sothere'snorealfunneltothedatathatindicatesincreasingvarianceintheresidualsaswemoveawayfromtheverticalaxis.
SoitlooksliketheconstancyoferrortermsisalsoOK.AndasIsaid,beforetheuncorrelatederrorterms,becausetheredoesn'tappeartobeanyparticularpatterntotheresiduals,alsoseemstobemet.
Nowinordertochecknormality,let'stakealookattheK-Stest.TheK-Stest,asdoallstatisticaltestsinSPSS,comeswithap-value.Andthatp-valueisalwaysgivenintheroworcolumnthat'sindicatedwithSig,justasitishere.Soyouseethetwo-tailedtestofsignificancecomesbackwithpisequalto0.200.Well,thatobviouslyisgreaterthan0.5soweacceptthenullthatthedataarenormallydistributedorthatthereisnodeparturefromnormality.
Sowe'regoodthereandwedon'tevenneedtolookataq-qplotinordertodiscerntheideaofnormalityfromtheresiduals.Thisisnotgoingtohappenawholelot.MostofthetimewhenyourunaK-Stest,yourdataaregoingtocomebackwithaprettylowp-value,suggestingthatyouaregoingtohavetomakeaq-qplot.Butinthiscase,wegotluckyandwecanfinishupthetestingofassumptionsthenwiththeK-Stest.
Sooverall,assumptionslookgood.Itappearstobealinearmodelwithconstancyofvariance,uncorrelatedandnormallydistributederrorterms.Sowe'regoodforgoingaheadwithlookingatthedataanalysis.Solet'stakealookatourregressionresults.
WhenIlookatregressionresultsinSPSS,thereareanumberoftablesthattheoutputprovides,includingbeginningwiththemodelsummary.AsyoulookattheRstatisticthat'sgiven,doesthatnumber,0.42,ringabell?RememberwhenwecalculatedthePearsoncorrelationcoefficientforthedata?Italsowas0.42.
That'sbecauseRinthismodelsummaryandRbeingtheabbreviationforcorrelation,isthecorrelationbetweenthosetwovariables.Itonlycomesoutthiswaywithsimpleregression.ButIwanttoshowyousomeofthesimilaritiesbetweentheinformationthatcorrelationprovidesandtheinformationthatasimplelinearregressionprovides.AndthatRfigureof0.42isexactlythat.
RsquareistheoverallmeasureofmodelfitthatI'musuallymostinterestedin.AndthatisthesquareofRwhichcomesupbeing0.002,whichmeansverylittlevarianceinthedependentvariableisbeingexplainedbyourmodel,andthatalmostallofourvariancehappenstoberandomvariance.Sojustatinybitofvarianceexplained,asshownbythevery,verylowRsquarefigure.
Andjusttocarryontheanalysis,we'lltakealookattheindividualhypothesistestsandwe'llfindthattheoverallSigvalue,theS-I-Gvaluelocatedon theright-handsideofthattable,isprettylarge.It's0.611.Well,doyourememberthatwhenwelookedatthePearsoncorrelationcoefficient,thep-valueofthatparticulartestwasalso0.611.
Sosimplelinearregressionandcorrelationareshowinguslargelythesameinformationhereandthestorythatit'stellingisthestorywe'veknownallalong.Andthatistheindependentvariabledoesnotdoaverygoodjobofexplainingorpredictingdependentvariables.Inotherwords,thedependentvariableofinternetusageisnotreallystronglyassociatedbyasimplelookatthedatawithincome.Butasweknow,there'smoretothestory.
Solet'sconsideramultipleregressionmodelwhereweaddagetotheregressionequation.Sothatycontinuestobeinternetusage.Ourfirstvariable,independent-wise,isincomeasitalwayshasbeen.Butnowwe'regoingtoaddasecondindependentvariableofage.
SonowwhatIwouldlikeyoutodoisthinkaboutwhatthoseregressionoutputsaregoingtolooklike.What'sgoingtohappentoRsquare?What'sgoingtohappentoouroverallmodelsignificance?Well,we'llfindoutwhenwecomebackinthenextsegment.
Given the results from the correlation and simple regression, what do you think will happen to the regression results when we add age as an additional independent variable? What will R-Squared do? What will the Model F-Test of Significance show?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started