Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The file banking.txt attached to this assignment provides data acquired from banking and census records for different zip codes in the banks current market. Such

The file banking.txt attached to this assignment provides data acquired from banking and census records for different zip codes in the bank’s current market. Such information can be useful in targeting advertising for new customers or for choosing locations for branch offices.  The data show

median age of the population (AGE)

median income (INCOME) in $

average bank balance (BALANCE) in $

median years of education (EDUCATION)

 

In this exercise you are asked to apply regression analysis techniques to describe the effect of age education and income on average account balance.

Analyze the distribution of average account balance using histogram, and compute appropriate descriptive statistics. Write a paragraph describing distribution of Balance and use appropriate descriptive statistics to describe center and spread of the distribution. Discuss your findings. Also, do you see any outliers? Include the histogram.

Create scatterplots to visualize the associations between bank balance and the other variables. Discuss the patterns displayed by the scatterplot. Also, do the associations appear to be linear? (You can create scatterplots or a matrix plot). Include the scatterplots.

Compute correlation values of bank balance vs the other variables. Interpret the correlation values, and discuss which pairs of variables appear to be strongly associated. Include the relevant output that shows the correlation values.

What is the independent variable and what are the dependent variable in this regression analysis?

Use SAS to fit a regression model to predict balance from age, education and income. Analyze the model parameters. Which predictors have a significant effect on balance? Use the t-tests on the parameters for alpha=0.05. Include the relevant regression output.

If one of the predictors is not significant, remove it from the model and refit the new regression model. Write the expression of the newly fitted regression model.

Interpret the value of the parameters for the variables in the model.

Report the value for the R2 coefficient and describe what it indicates. Include the portion of the output that includes the R2 coefficient values.

According to census data, the population for a certain zip code area has median age equal to 34.8 years, median education equal to 12.5 years and median income equal to $42,401. 

Use the final model computed in step (f) above to compute the predicted average balance for the zip code area.

If the observed average balance for the zip code area is $21,572, what’s the model prediction error?

Copy and paste your SAS code into the word document along with your answers.

Age    Education    Income    Balance
35.9    14.8    91033    38517
37.7    13.8    86748    40618
36.8    13.8    72245    35206
35.3    13.2    70639    33434
35.3    13.2    64879    28162
34.8    13.7    75591    36708
39.3    14.4    80615    38766
36.6    13.9    76507    34811
35.7    16.1    107935    41032
40.5    15.1    82557    41742
37.9    14.2    58294    29950
43.1    15.8    88041    51107
37.7    12.9    64597    34936
36       13.1    64894    32387
40.4    16.1    61091    32150
33.8    13.6    76771    37996
36.4    13.5    55609    24672
37.7    12.8    74091    37603
36.2    12.9    53713    26785
39.1    12.7    60262    32576
39.4    16.1    111548    56569
36.1    12.8    48600    26144
35.3    12.7    51419    24558
37.5    12.8    51182    23584
34.4    12.8    60753    26773
33.7    13.8    64601    27877
40.4    13.2    62164    28507
38.9    12.7    46607    27096
34.3    12.7    61446    28018
38.7    12.8    62024    31283
33.4    12.6    54986    24671
35        12.7    48182    25280
38.1    12.7    47388    24890
34.9    12.5    55273    26114
36.1    12.9    53892    27570
32.7    12.6    47923    20826
37.1    12.5    46176    23858
23.5    13.6    33088    20834
38       13.6    53890    26542
33.6    12.7    57390    27396
41.7     13      48439    31054
36.6    14.1    56803    29198
34.9    12.4    52392    24650
36.7    12.8    48631    23610
38.4    12.5    52500    29706
34.8    12.5    42401    21572
33.6    12.7    64792    32677
37       14.1    59842    29347
34.4    12.7    65625    29127
37.2    12.5    54044    27753
35.7    12.6    39707    21345
37.8    12.9    45286    28174
35.6    12.8    37784    19125
35.7    12.4    52284    29763
34.3    12.4    42944    22275
39.8    13.4    46036    27005
36.2    12.3    50357    24076
35.1    12.3    45521    23293
35.6    16.1    30418    16854
40.7    12.7    52500    28867
33.5    12.5    41795    21556
37.5    12.5    66667    31758
37.6    12.9    38596    17939
39.1    12.6    44286    22579
33.1    12.2    37287    19343
36.4    12.9    38184    21534
37.3    12.5    47119    22357
38.7    13.6    44520    25276
36.9    12.7    52838    23077
32.7    12.3    34688    20082
36.1    12.4    31770    15912
39.5    12.8    32994    21145
36.5    12.3    33891    18340
32.9    12.4    37813    19196
29.9    12.3    46528    21798
32.1    12.3    30319    13677
36.1    13.3    36492    20572
35.9    12.4    51818    26242
32.7    12.2    35625    17077
37.2    12.6    36789    20020
38.8    12.3    42750    25385
37.5    13        30412    20463
36.4    12.5    37083    21670
42.4    12.6    31563    15961
19.5    16.1    15395    5956
30.5    12.8    21433    11380
33.2    12.3    31250    18959
36.7    12.5    31344    16100
32.4    12.6    29733    14620
36.5    12.4    41607    22340
33.9    12.1    32813    26405
29.6    12.1    29375    13693
37.5    11.1    34896    20586
34       12.6    20578    14095
28.7    12.1    32574    14393
36.1    12.2    30589    16352
30.6    12.3    26565    17410
22.8    12.3    16590    10436
30.3    12.2    9354        9904
22       12       14115      9071
30.8    11.9    17992    10679
35.1    11       7741       6207

 

 

Problem 2  [5 points] - ONLY for Graduate Students

Historical data about the Boston Marathon can be found on its website. The graph shows winning times (in minutes) for men and women against the year in which the race was run. Men’s times are represented by “M” and women’s time by “W”. The graph also displays two regression lines of winning times vs year for men and women. There is no dataset for this question, but answer the following questions based on the graph.

Consider the men’s winning times, is there evidence of a linear trend? Would you expect the slope of the regression line to be positive or negative?

Now let’s consider the winning times for women, is there evidence of a linear trend? Discuss.

If we fit two separate linear regression models for men’s and women’s winning times, which slope will be greater in absolute value?

Step by Step Solution

3.45 Rating (155 Votes )

There are 3 Steps involved in it

Step: 1

Please note that we cant provide solutions using paid softwares such as sas however an open so... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Stats Data And Models

Authors: Richard D. De Veaux, Paul D. Velleman, David E. Bock

4th Edition

321986490, 978-0321989970, 032198997X, 978-0321986498

More Books

Students also viewed these Accounting questions