Question
Bank of Delaware The National Bank of Delaware County (NBDC), with 12 locations in New York, is an active lender for customers looking to purchase
Bank of Delaware
The National Bank of Delaware County (NBDC), with 12 locations in New York, is an active
lender for customers looking to purchase a home, finance a small business, or open a new credit card. Their mission
is clear: "As an independent bank, we make it our business to reinvest in the communities we serve by making loans
to individuals and businesses, supporting community projects, events and charities."
In order to provide the best rates possible without carrying too much risk, the loan officers need to be able to
accurately assess the likelihood a borrower will repay his or her loan. Credit score is one measure the bank uses, but
as a community bank, they don't make decisions solely on the basis of credit score. They have developed their own
method of
assessing credit scores
, based on age, weekly income, and whether he or she owns or rents his or her
primary residence.
Describe and Predict credit scores using one or more of the available Variables:
Credit Score : borrowers' credit score points.
Age : customers' age in years.
Income : weekly income in $
Residence : whether he or she owns or rents his or her primary residence
House.Price.($) 414441 77511 99553 107093 90710 123187 128068 85138 165868 144126 115535 94085 100295 199256 160477 392586 206585 103039 273075 134710 262500 95643 133424 79858 118502 127167 167196 149072 153301 147562 156914 134463 116596 125699 72143 152928 159781 158515 127702 128470 96500 87247 53392 243925 117172 175200 53784 158886 111059 68907 130438 107435 63030 180063 66896 219015 165284 136940 109383 142650 226311 201627 71447 102129 130371 122480 153775 109042 116399 158311 189352 50188 120859 110647 300287 78980 118778 142671 144812 290658 72932 186872 156767 102307 116190 106257
Crime.Rate.(crime.per.1000.population) 12.7 66.9 66.4 42.4 59 19.6 32.6 39.9 43.8 21.5 40.1 38.4 64 54.8 22.1 17.9 43.5 26.1 16.8 25.5 59.2 15.1 50.1 54.8 17.1 32.8 52.8 12.7 24.8 25.9 28.7 12.7 50.8 11.9 13.1 13.1 52 12.2 40.9 36.9 63.7 45.1 63.8 49.4 40.2 13.5 46.5 18.2 15.4 40.8 66.3 13.9 54.1 12.2 42.6 26.8 38.9 17.9 46.1 24.6 18.1 15.2 48.8 21.8 63.9 36.4 15.4 12.8 44.3 15.8 52.1 36.5 56.6 57.5 12.7 59.9 52.2 20.4 43.7 34 56.4 42.5 52 66.6 35.2 64.3
Miles.from.Downtown.(mile) 28.5 15 25.5 11.5 18.5 17.5 16 16 18 28.5 29.5 24 10 23.5 39 24 16.5 31 32 15.5 20.5 16.5 21.5 16 27.5 13.5 15 26.5 18 15 19 24 20 33 15.5 22 19 23 17 19 15.5 34 3 41 19.5 35.5 20.5 12.5 17 5 14.5 18 15.5 20.5 23.5 18 26 27 17 19 25.5 23.5 17 23 25 21.5 17 18.5 18.5 15.5 19.5 30 29.5 17 33 17.5 12 8 17 22.5 13.5 24.5 20.5 13 12.5 18
Population.Change 7.4 -1.3 3.2 -2.8 0.8 -3.9 10.9 1.1 8.6 2.4 9.4 8.5 -1.6 12.7 8 7.4 1.3 11.9 12.7 -3.4 1.3 4.4 -1.4 7.5 11.5 -2.3 0.1 8.2 -0.7 2.8 3.3 6.3 6.8 8.5 -1.3 2.5 4.8 0.3 -0.7 3.6 -4.5 10.7 -0.2 8.7 -1.4 5.9 -0.8 -0.2 8.3 6.4 0.3 -3.1 3.1 0.4 -0.8 -1.5 -2.1 12.1 3.2 3.3 13 2.1 -3.1 7.2 5 0.6 1.4 7.6 -0.3 -0.5 2.6 3.1 5.6 -2.9 2.5 3.5 -1.4 0.2 -1.4 8.4 7.7 3.1 10.8 -4 -0.3 1.4
Has.Security.Gate 0 1 1 1 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 1 1 1
Question Set 1
1. Get to know your scientific question (Chapter 1)
(a) Identify the variable of interest.
(b) Identify the population(s) and sample(s).
(c) Identify the parameter(s) and statistic(s).
(d) What is the scientific question? Is this Descriptive Statistics or Inferential Statistics?
2. Get to know your data (Chapter 1)
(a) Identify the types of your data: nominal data, ordinal data or quantitative data.
(b) Identify the types of your data: time series data or cross-sectional data.
(c)
Identify the source of your data: primary data or secondary data. Do you think the data is
reliable? Are there possible issues with your data?
3. Calculate descriptive statistics in Excel (Chapter 3)
(a)
Calculate the sample statistics for your variable of interest, such as sample mean (
x
), median,
mode, variance (
s
2
), and standard deviation (
s
).
(b)
Identify two different groups based on the qualitative data. Calculate the above statistics for
each group to compare.
4. Display your data with charts and graphs in Excel (Chapter 2)
(a)
Construct displays that best describe your qualitative variable (e.g. bar chart, pie chart); and
describe the distribution.
(b)
Construct displays that best describe your variable of interest and describe its distribution.
(Use: Frequency distribution tables, histograms and/or the empirical rule to discuss normality,
symmetry and skewness)
(c)
Construct displays that best describe the relationship/association between two quantitative
variables (the variable of interest as the dependent variable, y, and another quantitative
variable as the independent variable, x); and describe the relationship.
5. Distributions (Chapters 5-6)
(a)
Consider the distribution of your quantitative data in 4(b). Would it be appropriate to use the
Binomial or Normal distribution to model your data? Why or why not? Hint: The binomial
distribution models success/failure discrete data while the normal distribution is for bell-
shaped continuous data.
1
Question Set 2
1. Construct a confidence interval for a population mean (Chapter 8)
(a)
Do you need to make assumptions in order to perform the procedure of constructing a
confidence interval? If so, what assumptions need to be made? If not, why?
(b) Construct a confidence interval for the average house price.
i. Should you use a z-interval or a t-interval? Why?
ii. Compute the necessary statistics for constructing a confidence interval.
iii.
Find the margin of error of the confidence interval at confidence levels of 92% and 95%,
respectively.
iv. Calculate these two confidence intervals.
(c)
Someone believes that the average house price is 153849.1 $. Does the sample support the
claim? Explain if you have different conclusions using the above two confidence intervals.
(You must discuss in terms of accuracy and precision.)
2. Conduct a hypothesis test for a population mean (Chapter 9)
(a)
Do you need to make assumptions in order to perform the procedure of conducting a hypothesis
test? If so, what assumptions need to be made? If not, why?
(b)
Using
= 0.04 perform a hypothesis test to determine if the average house price is lower than
154354 $.
i. Write down the hypotheses.
ii. Calculate the test statistic, critical value(s) and p-value.
iii. Describe your decision of the test and make a conclusion based on the context.
3. Compare two population means (Chapter 10)
(a)
Do you need to make assumptions in order to perform the procedure of conducting a hypothesis
test or constructing a confidence interval? If so, what assumptions need to be made? If not,
why?
(b)
Using
= 0.1 perform a hypothesis test to determine if the mean House Price $ of the two
groups identified by your qualitative variable are different. We cannot assume equal variances.
List the results of all key steps before you reach your conclusion, such as the hypotheses, test
statistic, critical value(s) and/or p-value. (Use the Data Analysis Toolpak in Excel.)
(c)
Find the 98% confidence interval to estimate the average difference in house price between the
two populations according to the qualitative variable.
(d) Interpret the above confidence interval.
2
Question Set 3
1. Building a Simple Linear Regression Model: Preprocess.
(a) Identify all quantitative variables from the dataset.
(b)
Construct a Scatter Plot to show the relationship between House Price $ (
Y
) and each
independent variable. Calculate the sample correlation coefficients for all pairs. Describe the
association.
(c) Which pair has the strongest linear association?
(d)
Write down the general formula for the Simple Linear Regression
Model
between Y and X.
(Write the formula using general parameters notation
0
and
1
, what should be capitalize or
lowercase ? what should be added, if any? )
2.
Describe the linear relationship between House Price $ (
Y
) and the variable you answered in
2(c) (above) as x.
(a)
Calculate the slope and y-intercept of the least squares regression line using Excel. Write
down the linear equation.
(b) Interpret the regression slope.
(c)
What percentage of the total variation in y can be explained by this independent variable x?
3. Use the regression model to predict House Price (
Y
).
(a)
What is the predicted house price with 33.5
? (Fill in the blank with units
and name of the independent variable you chose.)
(b)
Calculate the 92% confidence interval for the average House Price $ (
Y
) with 33.5
and interpret. (Fill in the blank with units and name of the independent
variable you chose.)
(c)
Calculate the 92% prediction interval for a SINGLE house price (
Y
) with 33.5
and interpret. (Fill in the blank with units and name of the independent variable you chose.)
4. Is there a linear relationship between
Y
and
X
?
(a) Test the significance of the slope of the regression equation. Use
=
0.07.
i. Write down the hypotheses.
ii. What is the p-value?
iii. Describe your decision.
(b)
Develop a 96% confidence interval for the population slope. Does this confidence interval
include 0? (c) State your conclusion.(Hint: You may need to re-calculate Regression analysis:
Data
Data Analysis
Regression
Confidence level.)
5.
Check the assumptions for regression analysis. Make necessary plots in Excel to justify and
include them in your answers.
(a)
Is the relationship between the dependent and independent variables linear? Which plot
should you check?
3
(b)
Do the residuals exhibit some pattern across values for the independent variable? Which plot
should you check?
(c)
Is the variation of the dependent variable the same across all values of the independent
variable? Which plot should you check?
(d) Do the residuals follow the normal probability distribution? Which plot should you check?
(e) Conclusion: Are the results from the regression analysis reliable?
Question Set 4
1.
Model 1: Develop a multiple regression model to predict the House Price (
Y
) using all the
other variables of interest as listed above. (Round all numerical answers to two decimal places
as needed.)
(a)
Identify qualitative variable(s) from the list of variables of interest, if there is any, and create
a dummy variable in Excel. (Note: use Excel function =IF()
and use alphabetical order
to assign values 0 and 1
)
(b)
Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the
regression equation for Model 1. (Enter in Excel the confidence level given in question 1(e).
Note: Excel requires that the independent variables be located in adjacent columns)
(c)
Explain the variation of the dependent variable after accounting for the effects of the other
independent variables:
i. What percentage of total variation in the House Price (
Y
) can be explained by Model 1?
ii. What is the value of the adjusted multiple coefficient of determination,
R
2
A
?
(d)
Is the overall regression model significant using
=
0.07? State the hypotheses and your
conclusion.
(e)
Which independent variables are signifcant predictors using
= 0.7 or confidence level 30%?
Which are not significant? (After accounting for the effects of the other independent variables)
2.
Develop a second multiple regression model (Model 2) using ONE step of the "backward
elimination method". (Remember: variables should be removed one at the time and regression
analysis i.e. coefficients,
R
2
, p-values, etc must be re-calculated at each step) (Round all
numerical answers to two decimal places as needed.)
(a) Which variable should you remove from Model 1? Why?
(b)
Perform a multiple regression with the Data Analysis Toolpak in Excel, and write down the
regression equation for Model 2. (Enter in Excel the confidence level given in question 2(e).
Note: Excel requires that the independent variables be located in adjacent columns)
(c) Explaining the variation of the dependent variable:
i.
What percentage of total variation in the House Price (
Y
) can be explained by Model 2? How
does this compare with the percentage you obtained with Model 1?
ii.
What is the value of the adjusted multiple coefficient of determination,
R
2
A
? How does this
compare with the one you obtained with Model 1?
4
(d) Is the overall regression model (Model 2) significant using
=
0.09?
(e)
Are all the independent variables in Model 2 significant predictors using
= 0.08 or confidence
level 92 % after accounting for the effects of the other independent variables?
(f) Prediction:
i. Is Model 2 better than Model 1?
ii.
Predict the house price(
Y
) with Has Security Gate = 1; Crime Rate (crime per 1000 population)
= 15.8; Miles from Downtown (mile) = 30.5; Population Change() = 3.1 using "the best"
model (between Model 1 and Model 2).
NOTE:
you may or may not need to use all given
values.
(g) Interpret regression coefficients.
i. Interpret the coefficient of Has Security Gate.
3. Check the assumptions for regression analysis for the model you have chosen. Make necessary
plots in Excel to justify.
(a) Is the relationship between the dependent and independent variables linear?
(b) Do the residuals exhibit some patterns across values of the independent variables?
(c)
Are the variations of the dependent variable the same across all values of the independent
variables?
(d) Do the residuals follow the normal probability distribution?
(e) Conclusion: Are the results from the regression analysis reliable?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started