Answered step by step
Verified Expert Solution
Link Copied!

Question

...
1 Approved Answer

Becausecorrelationsaresousefulandsoeasytouse,Ithinkit'simportantthatweunderstandsomeofthemathandthetheorybehindthecorrelation.I'mnotgoingtoaskyoutorepeatthestuffonanexam,butIwantyoutolookatitandponderitandponderitswisdom.Intermsofhowweinterpretcorrelations,wealwaysinterpretcorrelationsasexistingbetweentwovariables,whichinthisexample,we'llcallx1andx2.CorrelationsarealwayssignifiedinstatisticsbytheletterRforsamplesortheGreekletterrhoifwe'redealingwithpopulationparameters.We'lluserthistimebecauseourcalculationsaregoingtobeempirical,andwe'llassumethatfromasampleasopposedtoapopulationbasedonourpreviousdiscussions. Sohere'swhatitlookslike.Looksprettycomplicated,doesn'tit?Whatwe'regoingtodoontheuppersideorthenumeratorofthisparticularfractioniswe'regoingtotakeforx1allofthedifferencesbetweenanyindividualobservationinthemeansothatwehaveasumofthedistancesbetweenanyobservationandthemeanassociatedwithx1.Thenwe'regoingtodothesamethingforx2,takethedifferencesbetweenx2anditsmean.We'regoingtomultiplythosetogetherandsumthem. Onthedenominator,we'regoingtodosomethingalittlebitdifferent.We'regoingtosumtheresponses,butwe'regoingtodoitwiththesumsofsquares.Andyouremembersumsofsquares.X1minusxbar1wouldbethedifferencesofthedistancebetweenx1andthemean,andwe'regoingtosquarethatbythewaytotakeawaynegativedistances.Youcan'thavenegativedistance,anddistancemustalwaysbepositive.Andonewaytomakesurenumbersarepositiveistosquarethem. Thenwedothesamethingwithx2.Wetaketheindividualdistancesbetweenx2anditsmean,andwesquarethosetokeepthempositive.Andthentoundotheeffectsofsquaring,weputtheentiredenominatorunderneaththeradical.Nowlookatthedifferencebetweenthenumeratoranddenominator.Inonecase,we'remultiplyingacrossdistances.Thatiswe'remultiplyingx1timesx2intermsofthedistancesfromthemeansandthensummingthemthatway.Inthedenominator. We'redoingitwithinagivenvariable.Thatiswhereismultiplyingx1timesx1andx2timesx2beforewemultiplyacrossthem.Whatthisdoesthenisitbasicallygivesusanexpressionofthesharedvariancebetweenxand1andx2inthenumeratordividedbythetotalvariancebetweentheindividualvariables,thetotalvariationinx1andx2individually.Andwhatthisdoesmathematicallyisitprovidesuswithapercentagebasically,astrengthofthelinearrelationshipthatexistsbetweenx1andx2. Howmuchofthevariationinonevariableisaccountedfor--I'msorry.Thevariationbetweenthetwovariablesisaccountedforbyallthevariationthat'spossibleamongthetwovariables.Soitgivesusthestrengthofthelinearrelationship. Mathematicsaside,however,here'swhatwecanthinkofwhenwethinkaboutcorrelationsandhoweasytheyaretointerpret.Firstofall,correlationvaluesalwaysfallsomewherebetweennegative1andpositive1.Thatgivesusaneasymetricforbeingabletointerpretthestrengthofthoselinearrelationships.Perfectnegativecorrelationswouldbenegative1.Perfectpositivecorrelationswouldbepositive1.Buttherealityisisthatunlessyou'recorrelatingavariablewithitself,correlationsofnegativeand1positive1willneveroccur. Butthecloseracorrelationgetstoeitheroneofthoseextremesmeansthatthestrongerthelinearrelationshipthatexistsbetweenthetwovariables.Sotheclosertonegative1 orpositive1thatthePearsoncorrelationcoefficientis,itmeansthatwehaveastrongrelationship.Theclosereithergetto0,thatmeanstheweakertherelationshipwehave.Andweusethissimplemetricofnegative1topositive1asanindicantofthestrengthoftherelationshipawayfrom0togiveusthiseasygaugeofhowcloselytwovariablesarerelated. WecouldalsousescatterplotstointerpretcorrelationsvisuallytogiveyouanideaofwhatImeanbystrongversusweakcorrelations.Ifweweretocreateascatterplotwherex1isontheverticalaxisandx2isonthehorizontalaxis--andbythewaythatchoiceispurelyarbitrary--thenifweweretoplotthedatafor,say,ahighlypositivecorrelation,whatwewouldwindupwithisatightlypackedgroupofobservations.Andremembereachpointrepresentsasinglepersonorotherobservationthatprovideddatathateachdatapointhadtoprovideinformationonbothvariables.Butwhenthey'recloselyandtightlypackedtogetherlikethatonastraightline,thatwouldbeaveryhighpositivecorrelation.Asx2increases,x1isalsoincreasing,andthereforeourclusterofdatapointsupwardand totheright. Sowheredoesweaknesscomein?Whatwouldaweakpositivecorrelationlooklike?Well,actuallymanypeoplewhenIaskthisquestioninclasswouldansweritchangestheslopeofthecorrelation,butthat'snottrue.Actuallymoreaccurateisasitincreasesthecloudingofthescatterplot. Thehigherthecorrelation,themoretightlypackedtheobservationswillappearvisually.Theweakerthecorrelationis,themorespreadouttheyare.Andthisspreadindicatestherandomvariation. Ifyoulookatthemoderatelypositivelycorrelatedscatterplotthereontherightofthescreen,whatyouseeisacloudofdata,butnonethelessthereisstilladefinitedirectiontoit.Inotherwords,it'sclearthatyouhavesomethingthat'smovingupwardintotheright.Thedistanceapartfromthesedifferentobservationsrepresentstherandomvariation.Sothisstatisticalmodelofthecorrelationbetweenx1andx2capturessomeofthesystematicvariation,whichaccountsfortheupwardrightmovementofthedataandthedistancebetweenthemistherandomvariationthat'snotcapturedbythemodel. Wecandothesamethingwhenwehavenegativecorrelationsaswell.Theonlydifferencebetweenthisandtheprevioushighcorrelationonthepositivesideisinthiscase,thedataarepointingdownwardandtotheright,whichmeansthattheymoveinoppositedirections.Asx1isdecreasingvaluesofx2increase.Likewise,ifthiswereaonlymoderatenegativecorrelation,whatwewouldhaveismoreclouding. Butagaincorrelations,bytheway,arestatisticalmodels.Whatwe'redoingismodelingdatatofindthesystematicintherandomportionofthevariation.Correlationdoesn'tdoareallygoodjobofgivingusalotofdetailsaboutit,butitdoesgiveusthegeneralstrengthoftherelationship,positiveornegative,ofthelinearrelationshipthatexistsbetweentwovariables.We'lltalkmoreaboutcorrelationtocome. Based on what's been discussed to this point, what

Becausecorrelationsaresousefulandsoeasytouse,Ithinkit'simportantthatweunderstandsomeofthemathandthetheorybehindthecorrelation.I'mnotgoingtoaskyoutorepeatthestuffonanexam,butIwantyoutolookatitandponderitandponderitswisdom.Intermsofhowweinterpretcorrelations,wealwaysinterpretcorrelationsasexistingbetweentwovariables,whichinthisexample,we'llcallx1andx2.CorrelationsarealwayssignifiedinstatisticsbytheletterRforsamplesortheGreekletterrhoifwe'redealingwithpopulationparameters.We'lluserthistimebecauseourcalculationsaregoingtobeempirical,andwe'llassumethatfromasampleasopposedtoapopulationbasedonourpreviousdiscussions.

Sohere'swhatitlookslike.Looksprettycomplicated,doesn'tit?Whatwe'regoingtodoontheuppersideorthenumeratorofthisparticularfractioniswe'regoingtotakeforx1allofthedifferencesbetweenanyindividualobservationinthemeansothatwehaveasumofthedistancesbetweenanyobservationandthemeanassociatedwithx1.Thenwe'regoingtodothesamethingforx2,takethedifferencesbetweenx2anditsmean.We'regoingtomultiplythosetogetherandsumthem.

Onthedenominator,we'regoingtodosomethingalittlebitdifferent.We'regoingtosumtheresponses,butwe'regoingtodoitwiththesumsofsquares.Andyouremembersumsofsquares.X1minusxbar1wouldbethedifferencesofthedistancebetweenx1andthemean,andwe'regoingtosquarethatbythewaytotakeawaynegativedistances.Youcan'thavenegativedistance,anddistancemustalwaysbepositive.Andonewaytomakesurenumbersarepositiveistosquarethem.

Thenwedothesamethingwithx2.Wetaketheindividualdistancesbetweenx2anditsmean,andwesquarethosetokeepthempositive.Andthentoundotheeffectsofsquaring,weputtheentiredenominatorunderneaththeradical.Nowlookatthedifferencebetweenthenumeratoranddenominator.Inonecase,we'remultiplyingacrossdistances.Thatiswe'remultiplyingx1timesx2intermsofthedistancesfromthemeansandthensummingthemthatway.Inthedenominator.

We'redoingitwithinagivenvariable.Thatiswhereismultiplyingx1timesx1andx2timesx2beforewemultiplyacrossthem.Whatthisdoesthenisitbasicallygivesusanexpressionofthesharedvariancebetweenxand1andx2inthenumeratordividedbythetotalvariancebetweentheindividualvariables,thetotalvariationinx1andx2individually.Andwhatthisdoesmathematicallyisitprovidesuswithapercentagebasically,astrengthofthelinearrelationshipthatexistsbetweenx1andx2.

Howmuchofthevariationinonevariableisaccountedfor--I'msorry.Thevariationbetweenthetwovariablesisaccountedforbyallthevariationthat'spossibleamongthetwovariables.Soitgivesusthestrengthofthelinearrelationship.

Mathematicsaside,however,here'swhatwecanthinkofwhenwethinkaboutcorrelationsandhoweasytheyaretointerpret.Firstofall,correlationvaluesalwaysfallsomewherebetweennegative1andpositive1.Thatgivesusaneasymetricforbeingabletointerpretthestrengthofthoselinearrelationships.Perfectnegativecorrelationswouldbenegative1.Perfectpositivecorrelationswouldbepositive1.Buttherealityisisthatunlessyou'recorrelatingavariablewithitself,correlationsofnegativeand1positive1willneveroccur.

Butthecloseracorrelationgetstoeitheroneofthoseextremesmeansthatthestrongerthelinearrelationshipthatexistsbetweenthetwovariables.Sotheclosertonegative1 orpositive1thatthePearsoncorrelationcoefficientis,itmeansthatwehaveastrongrelationship.Theclosereithergetto0,thatmeanstheweakertherelationshipwehave.Andweusethissimplemetricofnegative1topositive1asanindicantofthestrengthoftherelationshipawayfrom0togiveusthiseasygaugeofhowcloselytwovariablesarerelated.

WecouldalsousescatterplotstointerpretcorrelationsvisuallytogiveyouanideaofwhatImeanbystrongversusweakcorrelations.Ifweweretocreateascatterplotwherex1isontheverticalaxisandx2isonthehorizontalaxis--andbythewaythatchoiceispurelyarbitrary--thenifweweretoplotthedatafor,say,ahighlypositivecorrelation,whatwewouldwindupwithisatightlypackedgroupofobservations.Andremembereachpointrepresentsasinglepersonorotherobservationthatprovideddatathateachdatapointhadtoprovideinformationonbothvariables.Butwhenthey'recloselyandtightlypackedtogetherlikethatonastraightline,thatwouldbeaveryhighpositivecorrelation.Asx2increases,x1isalsoincreasing,andthereforeourclusterofdatapointsupwardand totheright.

Sowheredoesweaknesscomein?Whatwouldaweakpositivecorrelationlooklike?Well,actuallymanypeoplewhenIaskthisquestioninclasswouldansweritchangestheslopeofthecorrelation,butthat'snottrue.Actuallymoreaccurateisasitincreasesthecloudingofthescatterplot.

Thehigherthecorrelation,themoretightlypackedtheobservationswillappearvisually.Theweakerthecorrelationis,themorespreadouttheyare.Andthisspreadindicatestherandomvariation.

Ifyoulookatthemoderatelypositivelycorrelatedscatterplotthereontherightofthescreen,whatyouseeisacloudofdata,butnonethelessthereisstilladefinitedirectiontoit.Inotherwords,it'sclearthatyouhavesomethingthat'smovingupwardintotheright.Thedistanceapartfromthesedifferentobservationsrepresentstherandomvariation.Sothisstatisticalmodelofthecorrelationbetweenx1andx2capturessomeofthesystematicvariation,whichaccountsfortheupwardrightmovementofthedataandthedistancebetweenthemistherandomvariationthat'snotcapturedbythemodel.

Wecandothesamethingwhenwehavenegativecorrelationsaswell.Theonlydifferencebetweenthisandtheprevioushighcorrelationonthepositivesideisinthiscase,thedataarepointingdownwardandtotheright,whichmeansthattheymoveinoppositedirections.Asx1isdecreasingvaluesofx2increase.Likewise,ifthiswereaonlymoderatenegativecorrelation,whatwewouldhaveismoreclouding.

Butagaincorrelations,bytheway,arestatisticalmodels.Whatwe'redoingismodelingdatatofindthesystematicintherandomportionofthevariation.Correlationdoesn'tdoareallygoodjobofgivingusalotofdetailsaboutit,butitdoesgiveusthegeneralstrengthoftherelationship,positiveornegative,ofthelinearrelationshipthatexistsbetweentwovariables.We'lltalkmoreaboutcorrelationtocome.

Based on what's been discussed to this point, what limitations do you see with correlation analysis?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Economics questions

Question

Wastes in the blood leave the body through the production of by the

Answered: 1 week ago