Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Location Income ($1,000) Urban 27 Rural 25 Suburban 25 Suburban 26 Rural 30 Urban 29 Rural 33 Urban 30 Suburban 32 Urban 34 Urban 35

Location Income ($1,000) Urban 27 Rural 25 Suburban 25 Suburban 26 Rural 30 Urban 29 Rural 33 Urban 30 Suburban 32 Urban 34 Urban 35 Urban 40 Rural 30 Rural 33 Urban 42 Suburban 32 Urban 43 Urban 43 Rural 33 Urban 47 Suburban 35 Urban 54 Suburban 42 Rural 36 Urban 57 Suburban 44 Rural 38 Urban 54 Urban 54 Suburban 46 Rural 40 Urban 60 Urban 58 Urban 61 Urban 61 Urban 62 Suburban 49 Urban 68 Suburban 57 Rural 45 Urban 71 Suburban 57 Suburban 64 Rural 45 Urban 74 Suburban 65 Rural 47 Rural 53 Suburban 66 Suburban 69 Size 1 4 1 1 5 1 6 1 2 1 1 1 6 6 2 2 2 2 7 2 3 2 3 7 3 3 7 3 3 4 7 4 4 5 5 6 5 6 6 8 7 7 8 8 7 8 8 8 8 8 Years 2 2 1 2 5 3 10 4 4 6 8 9 9 11 10 4 10 10 13 10 5 11 5 13 11 6 15 8 10 6 15 11 10 13 13 14 8 14 8 16 15 9 9 17 19 10 18 18 10 10 Credit Balance($) 2,631 2,047 3,155 3,913 2,660 3,531 2,766 3,769 4,082 3,806 4,049 4,073 2,697 2,914 4,073 4,310 4,199 4,253 3,104 4,293 4,456 4,340 4,925 3,178 4,391 4,947 3,203 4,354 4,366 5,003 3,250 4,402 4,397 4,595 4,786 4,888 5,148 5,011 5,220 3,257 5,528 5,283 5,332 3,304 5,553 5,484 3,342 3,788 5,756 5,861 KELLER GRADUATE SCHOOL OF MANAGEMENT Course Project Math533 Part A Jasmyn Mead-Payne Introduction AJ DAVIS is a department store chain, which has many credit customers and wants to find out more information about these customers. A sample of 50 credit customers is selected with data collected on the five variables namely Location Income, Size, Years, and Credit Balance. 1. Location (rural, urban, suburban) 2. Income (in $1,000's) 3. Size (household size, meaning number of people living in the household) 4. Years (the number of years that the customer has lived in the current location) 5. Credit balance (the customers current credit card balance on the store's credit card, in $). Location The pie chart for Location categorized into Urban, Sub-urban and Rural is shown below. Pie Chart of Location Rural 26.0% Category Rural S uburban Urban Urban 44.0% Suburban 30.0% Here, Location for 44% of the customers is Urban. And 30% are from sub urban. While the rest 26% are from rural area. The tally representing numeric counts of each sub-category of Location is shown below. Tally for Discrete Variables: Location Location Rural Suburban Urban N= Count 13 15 22 50 I observe that among 50 customers, 22 live in Urban, 15 in Suburban and the rest 13 in Rural. Here the data is measured on nominal scale. The nominal level of measurement describes variables that are categorical in nature. And hence mode is the best measure of central tendency. Here, mode is Rural. I can say that highest percentage of customers is selected from Urban. And then from suburban and the least is from Rural. Income The Histogram of Income variable is shown below. Summary Report for Income ($1 ,000) Anderson-Darling Normality Test A-Squared P-Value Mean StDev Variance Skewness Kurtosis N Minimum 1 Quartile st Median 3rd Quartile Maximum 0.72 0.056 46.020 1 3.884 1 92.755 0.25856 -1 .09058 50 25.000 33.000 44.500 57.250 74.000 95% Confidence Interval for Mean 30 40 50 60 42.074 70 49.966 95% Confidence Interval for Median 39.343 53.328 95% Confidence Interval for StDev 1 .597 1 1 7.301 95% Confidence Intervals Mean Median 40.0 42.5 45.0 47.5 50.0 52.5 55.0 I observe that a distribution of Income is not symmetric but slightly skewed to the right. This implies that there are very few customers with high income levels. The descriptive statistics of Income variable as obtained from Minitab is shown below. Descriptive Statistics: Income ($1,000) Variable Q3 N N* Mean SE Mean StDev Variance Minimum Q1 Median Income ($1,000) 57.25 Variable Income ($1,000) 50 0 Maximum 74.00 46.02 IQR 24.25 1.96 13.88 Mode 30, 33, 54, 57 N for Mode 3 192.75 25.00 33.00 44.50 Skewness 0.26 Median is considered as best measure of central tendency when data is continuous and has outliers. Median is 44.50 which implies, half of the customers have income more than 44.50 while other half has less than 44.05. The average Income of customers is observed to be 46.02 ($1000's). The minimum and maximum range of Income is 25 ($1000's) and 74 ($1000's) respectively. The variance and the standard deviation are both measures of the spread of the distribution about the mean. The standard deviation measures how concentrated the data are around the mean; the more concentrated, the smaller the standard deviation. Standard deviation measures spread in the same physical unit as the original data. Standard deviation and variance is given in the table above. For a positively skewed data, mean is greater than median which is greater than mode. Here, mean (46.03) is greater than median (44.50) which is greater than mode (30). Literal meaning of Skewness is lack of symmetry. I study Skewness sot have an idea about shape of the curve. Here Skewness is +0.26, which is positive. The same is obtained from Histogram. I can conclude that graph of Income is positively skewed implying that very few customers with high income levels. Median is 44.50 which implies, half of the customers have income more than 44.50 while other half has less than 44.05. Credit Balance The Histogram of Credit Balance variable as obtained from Minitab is shown below. Summary Report for Credit Balance($) Anderson-Darling Normality Test A-Squared P-Value Mean StDev Variance Skewness Kurtosis N Minimum 1 Quartile st Median 3rd Quartile Maximum 0.38 0.400 41 53.5 931 .9 868429.8 -0.1 43 501 -0.721 489 50 2047.0 3292.3 4273.0 4930.5 5861 .0 95% Confidence Interval for Mean 2000 3000 4000 5000 6000 3888.6 441 8.3 95% Confidence Interval for Median 3877.9 4398.6 95% Confidence Interval for StDev 778.4 1 61 1 .3 95% Confidence Intervals Mean Median 3900 4000 41 00 4200 4300 4400 I observe that histogram of Credit Balance is approximately bell shaped (symmetric) and hence data is approximately normally distributed. For a normally distributed data, mean = median = mode. The descriptive statistics of Credit Balance variable as obtained from Minitab is shown below. Descriptive Statistics: Credit Balance($) Variable Credit Balance($) N 50 N* 0 Variable Credit Balance($) Maximum 5861 Mean 4153 IQR 1638 SE Mean 132 Mode 4073 StDev 932 N for Mode 2 Variance 868430 Minimum 2047 Q1 3292 Median 4273 Q3 4931 Skewness -0.15 From the descriptive statistics summary I observe that mean of credit balance is 4153. Mean is the best measure of central tendency when data is continuous and has normal distribution. That is on an average value of credit balance is 4153. Standard deviation measures the spread of distribution. The standard deviation measures how concentrated the data are around the mean; the more concentrated, the smaller the standard deviation. Here standard deviation is 932. The minimum and maximum value for Credit Balance is 2047 and 5861 respectively. I also observe that numerically mean is approximately equal to mean and mode. This implies that data is approximately normally distributed. This is same as obtained from Histogram. From histogram and descriptive statistics I can conclude that Credit Balance is Normally Distributed. The average value of credit balance is 4153. Years and Income The scatter plot of Years and Income is shown below. Here, Years is taken on X axis and Income of Y axis. Scatterplot of Income ($1 ,000) vs Years 80 Income ($1 ,000) 70 60 50 40 30 20 0 5 1 0 1 5 20 Years An upward trend is observed between Years and Income implying that as Years increases, the Income also increases. This implies that there is positive linear relationship between Years and Income. The numeric value of correlation coefficient as obtained from Minitab is shown below. Correlation: Income ($1,000), Years Pearson correlation of Income ($1,000) and Years = 0.579 P-Value = 0.000 Here, correlation coefficient between Years and Income is 0.579, implying there is positive linear relationship between the two variables. The same result is also obtained from the scatter plot shown above. Hence I can conclude that there is positive linear relationship between Years and Income. That is as the value of years increases the value of Income also increases. Income and Credit Balance The scatter plot of Income and Credit Balance is shown below. Here, Income is taken on X axis and Credit Balance of Y axis. Scatterplot of Credit Balance($) vs Income ($1 ,000) 6000 Credit Balance($) 5000 4000 3000 2000 20 30 40 50 60 70 80 Income ($1 ,000) An upward trend is observed between Income and Credit Balance implying that as Income increases, the Credit Balance also increases. The points are quite close to each other implying that there is very strong positive linear relationship between Income and Credit Balance. The numeric value of correlation coefficient as obtained from Minitab is shown below. Correlation: Income ($1,000), Credit Balance($) Pearson correlation of Income ($1,000) and Credit Balance($) = 0.801 P-Value = 0.000 Here, correlation coefficient between Income and Credit Balance is 0.801, implying there is very strong positive linear relationship between the two variables. The same result is also obtained from the scatter plot shown above. Hence I can conclude that there is very strong positive linear relationship between Income and Credit Balance. That is as the value of Income increases the value of Credit Balance also increases. Location and Income The Bar graph of Income categorized by divisions of Location is shown below. Chart of Income ($1 ,000) 1 200 Income ($1 ,000) 1 000 800 600 400 200 0 Urban Rural Suburban Location It is observed from the bar graph that Income of customers in urban areas is the highest as compared to Suburban and Rural. And Income of customers in Rural is lowest as compared to income of customers in Urban and sub-urban. Descriptive statistics for Location and Income is given below. Tabulated Statistics: Location, Income ($1,000) Using frequencies in Income ($1,000) Rows: Location Columns: Income ($1,000) 25 26 27 29 30 32 33 34 35 36 38 40 42 43 44 45 46 47 49 25 0 0 0 60 0 99 0 0 36 38 40 0 0 0 90 0 47 0 53 Rural 53 Suburban 0 Urban 0 All 53 26 0 0 0 64 0 0 35 0 0 0 42 0 44 0 46 0 49 0 0 27 29 30 0 0 34 35 0 0 40 42 86 0 0 0 47 0 50 26 27 29 90 64 99 34 70 36 38 80 84 86 44 90 46 94 49 54 Rural Suburban Urban All 25 57 58 60 61 62 64 65 66 68 69 71 74 All 0 0 162 162 0 114 57 171 0 0 58 58 0 0 60 60 0 0 122 122 0 0 62 62 0 64 0 64 0 65 0 65 0 66 0 66 0 0 68 68 0 69 0 69 0 0 71 71 0 0 74 74 488 709 1104 2301 Cell Contents: Count The highest level of income is observed in urban areas. While the lowest level of incomes is observed in rural areas. The same is observed from bar graph. Conclusion Highest percentage of customers is selected from Urban. And then from suburban and the least is from Rural. Graph of Income is positively skewed implying that very few customers with high income levels. Best measure of central tendency is Median. Credit Balance is Normally Distributed. The best measure of central tendency is Mean which is $4153 here. There is positive linear relationship between Years and Income. That is as the value of years increases the value of Income also increases. There is very strong positive linear relationship between Income and Credit Balance. That is as the value of Income increases the value of Credit Balance also increases. The highest level of income is observed in urban areas. While the lowest level of incomes is observed in rural areas. KELLER GRADUATE SCHOOL OF MANAGEMENT Course Project Math 533 Part B Jasmyn Mead-Payne Solutions to Project Part B: Hypothesis Testing and Confidence Intervals: A. Summary Report: a. The test statistic t (49) = 0.52, p = 0.303 > 0.05, indicates that the data does not provide sufficient evidence to support the claim that the true average annual income was greater than $45,000. The 95% confidence interval for the true average annual income {42.074, 49.966}, that contains the value 45, i.e. $45,000, also indicates the same. b. The test statistic z = -2.13, p = 0.017 < 0.05, indicates that the data provides sufficient evidence to support the claim that the true population proportion of customers who live in a suburban area is less than 45%. The 95% confidence interval for the true population proportion of customers who live in a suburban area {0.1730, 0.4270}, that does not contain the value 0.45, i.e. 45%, also indicates the same. c. The test statistic t (49) = 2.50, p = 0.008 < 0.05, indicates that the data provides sufficient evidence to support the claim that the true average number of years lived in the current home is greater than 8 years. The 95% confidence interval for the true average number of years lived in the current home {8.312, 10.888}, that does not contains the value 8, and the lower limit is more than 8, also indicates the same. d. The test statistic t (12) = -1.36, p = 0.100 > 0.05, indicates that the data does not provide sufficient evidence to support the claim that the average credit balance for rural customers is less than $3,200. The 95% confidence interval for the true average credit balance for rural customers {2,780.86, 3,297.66}, that contains the value $3,200, also indicates the same. B: a. The null and alternative hypotheses for this test are: H 0 : The average annual income was at most $45,000, i.e. 45 H a : The average annual income was greater than $45,000, i.e. 45 0.05 Test Statistic, t x 0.52 s n p -value 0.303 Since p-value 0.05, we fail to reject the null hypothesis. Therefore, at 5% level data provides no enough evidence to support the claim that the average annual income was greater than $45,000. The 95% confidence interval for the true average annual income is 42.074,49.966 . Since the confidence interval contains the value 45, it provides no enough evidence to support the claim that the average annual income was greater than $45,000. b. The null and alternative hypotheses for this test are: H 0 : The true population proportion of customers who live in a suburban area is at least 45%, i.e. p 0.45 H a : The true population proportion of customers who live in a suburban area is less than 45%, i.e. p 0.45 0.05 Test Statistic, z p p p 1 p n 2.13 p -value 0.017 Since p-value 0.05, we have to reject the null hypothesis. Therefore, at 5% level data provides enough evidence to support the claim that the true population proportion of customers who live in a suburban area is less than 45%. The 95% confidence interval for the true population proportion of customers who live in a suburban area is 0.1730,0.4270 . Since the confidence interval doesn't contains the value 0.45, and both the limits are less than 0.45, it provides no enough evidence to support the claim that the true population proportion of customers who live in a suburban area is less than 45%. c. The null and alternative hypotheses for this test are: H 0 : The average number of years lived in the current home is at most 8 years, i.e. 8 H a : The average number of years lived in the current home is greater than 8 years, i.e. 8 0.05 Test Statistic, t x s n 2.50 p -value 0.008 Since p -value 0.05, we have to reject the null hypothesis. Therefore, at 5% level data provides enough evidence to support the claim that the average number of years lived in the current home is greater than 8 years. The 95% confidence interval for the true average number of years lived in the current home is 8.312,10.888 . Since the confidence interval doesn't contains the value 8, and both the limits are greater than 8, it provides enough evidence to support the claim that the average number of years lived in the current home is greater than 8 years. d. The null and alternative hypotheses for this test are: H 0 : The average credit balance for rural customers is at least $3200, i.e. 3, 200 H a : The average credit balance for rural customers is less than $3200, i.e. 3, 200 0.05 Test Statistic, t x 1.36 s n p -value 0.100 Since p-value 0.05, we fail to reject the null hypothesis. Therefore, at 5% level data provides not enough evidence to support the claim that the average credit balance for rural customers is less than $3200. The 95% confidence interval for the true average credit balance for rural customers is 2,780.86,3,297.64 . Since the confidence interval contains the value 3,200, it provides no enough evidence to support the claim that the average credit balance for rural customers is less than $3200. Appendix Part 1: The MINITAB outputs showing the calculations of the tests performed for parts a-d are shown here: Part 2: The MINITAB outputs showing the 95% confidence intervals for parts a-d are shown here: Course Project: AJ DAVIS DEPARTMENT STORES Introduction AJ DAVIS is a department store chain, which has many credit customers and wants to find out more information about these customers. A sample of 50 credit customers is selected with data collected on the following five variables. 1. Location (rural, urban, suburban) 2. Income (in $1,000'sbe careful with this) 3. Size (household size, meaning number of people living in the household) 4. Years (the number of years that the customer has lived in the current location) 5. Credit balance (the customers current credit card balance on the store's credit card, in $). The data is available in Doc Sharing Course Project Data Set as an Excel file. You are to copy and paste the data set into a minitab worksheet. PROJECT PART A: Exploratory Data Analysis Open the file MATH533 Project Consumer.xls from the Course Project Data Set folder in Doc Sharing. For each of the five variables, process, organize, present, and summarize the data. Analyze each variable by itself using graphical and numerical techniques of summarization. Use minitab as much as possible, explaining what the printout tells you. You may wish to use some of the following graphs: stem-leaf diagram, frequency or relative frequency table, histogram, boxplot, dotplot, pie chart, bar graph. Caution: Not all of these are appropriate for each of these variables, nor are they all necessary. More is not necessarily better. In addition, be sure to find the appropriate measures of central tendency and measures of dispersion for the above data. Where appropriate use the five number summary (the Min, Q1, Median, Q3, Max). Once again, use minitab as appropriate, and explain what the results mean. Analyze the connections or relationships between the variables. There are 10 pairings here (location and income, location and size, location and years, location and credit balance, income and size, income and years, income and balance, size and years, size and credit balance, years and Credit Balance). Use graphical as well as numerical summary measures. Explain what you see. Be sure to consider all 10 pairings. Some variables show clear relationships, while others do not. Prepare your report in Microsoft Word (or some other word processing package), integrating your graphs and tables with text explanations and interpretations. Be sure that you have graphical and numerical back up for your explanations and interpretations. Be selective in what you include in the report. I'm not looking for a 20-page report on every variable and every possible relationship (that's 15 things to do). Rather, what I want you do is to highlight what you see for three individual variables (no more than one graph for each, one or two measures of central tendency and variability (as appropriate), and two or three sentences of interpretation). For the 10 pairings, identify and report only on three of the pairings, again using graphical and numerical summary (as appropriate), with interpretations. Please note that at least one of your pairings must include location and at least one of your pairings must not include location. All DeVry University policies are in effect, including the plagiarism policy. Project Part A report is due by the end of Week 2. Project Part A is worth 100 total points. See grading rubric below. Submission: The report from Part 4, including all relevant graphs and numerical analysis along with interpretations Format for report: A. Brief introduction B. Discuss your first individual variable, using graphical, numerical summary, and interpretation C. Discuss your second individual variable, using graphical, numerical summary, and interpretation D. Discuss your third individual variable, using graphical, numerical summary, and interpretation E. Discuss your first pairing of variables, using graphical, numerical summary, and interpretation F. Discuss your second pairing of variables, using graphical, numerical summary, and interpretation G. Discuss your third pairing of variables, using graphical, numerical summary, and interpretation H. Conclusion Project Part A: Grading Rubric Category Points % Description Three Individual graphical analysis, numerical analysis (when Variables 36 36 appropriate) and interpretation 12 points each Three Relationships 45 45 graphical analysis, numerical analysis (when Category 15 points each Communication Skills Points % Total 100 19 Description appropriate), and interpretation writing, grammar, clarity, logic, cohesiveness, 19 adherence to the above format A quality paper will meet or exceed all of the 100 above requirements. Project Part B: Hypothesis Testing and Confidence Intervals Your manager has speculated the following. a. The average (mean) annual income was greater than $45,000. b. The true population proportion of customers who live in a suburban area is less than 45%. c. The average (mean) number of years lived in the current home is greater than 8 years. d. The average (mean) credit balance for rural customers is less than $3,200. 1. Using the sample data, perform the hypothesis test for each of the above situations in order to see if there is evidence to support your manager's belief in each case A-D. In each case, use the Seven Elements of a Test of Hypothesis in Section 6.2 of your text book with = .05, and explain your conclusion in simple terms. Also, be sure to compute the p-value and interpret. 2. Follow this up with computing 95% confidence intervals for each of the variables described in A-D, and again interpreting these intervals. 3. Write a report to your manager about the results, distilling down the results in a way that would be understandable to someone who does not know statistics. Clear explanations and interpretations are critical. 4. All DeVry University policies are in effect, including the plagiarism policy. 5. Project Part B report is due by the end of Week 6. 6. Project Part B is worth 100 total points. See the grading rubric below. Submission: The report from Part 3 and all of the relevant work done in the hypothesis testing (including minitab) in 1 and the confidence intervals (minitab) in Part 2 as an appendix Format for report: A. Summary report (about one paragraph on each of the speculations, A-D) B. Appendix with all of the steps in hypothesis testing (the format of the Seven Elements of a Test of Hypothesis, in Section 6.2 of your text book) for each speculation A-D, as well as the confidence intervals, including all minitab output Project Part B: Grading Rubric Category Points % Description Addressing each speculation hypothesis test, interpretation, 80 80 20 points each confidence interval, and interpretation one paragraph on each of the Summary report 20 20 speculations A quality paper will meet or exceed all Total 100 100 of the above requirements. Project Part C: Regression and Correlation Analysis Using MINITAB, perform the regression and correlation analysis for the data on income(Y), the dependent variable, and credit balance (X), the independent variable, by answering the following. 1. Generate a scatterplot for income ($1,000) versus credit balance($), including the graph of the best fit line. Interpret. 2. Determine the equation of the best fit line, which describes the relationship between income and credit balance. 3. Determine the coefficient of correlation. Interpret. 4. Determine the coefficient of determination. Interpret. 5. Test the utility of this regression model (use a two tail test with =.05). Interpret your results, including the p-value. 6. Based on your findings in 1-5, what is your opinion about using credit balance to predict income? Explain. 7. Compute the 95% confidence interval for beta-1 (the population slope). Interpret this interval. 8. Using an interval, estimate the average income for customers that have credit balance of $4,000. Interpret this interval. 9. Using an interval, predict the income for a customer that has a credit balance of $4,000. Interpret this interval. 10. What can we say about the income for a customer that has a credit balance of $10,000? Explain your answer. In an attempt to improve the model, we attempt to do a multiple regression model predicting income based on credit balance, years, and size. 11. Using MINITAB, run the multiple regression analysis using the variables credit balance, years, and size to predict income. State the equation for this multiple regression model. 12. Perform the global test foruUtility (F-Test). Explain your conclusion. 13. Perform the t-test on each independent variable. Explain your conclusions and clearly state how you should proceed. In particular, state which independent variables should we keep and which should be discarded. 14. Is this multiple regression model better than the linear model that we generated in parts 1-10? Explain. All DeVry University policies are in effect, including the plagiarism policy. 15. Project Part C report is due by the end of Week 7. 16. Project Part C is worth 100 total points. See the grading rubric below. Summarize your results from 1-14 in a report that is 3 pages or less in length and explains and interprets the results in ways that are understandable to someone who does not know statistics. Submission: The summary report + all of the work done in 1-14 (Minitab Output + interpretations) as an appendix Format: A. Summary Report B. Points 1-14 addressed with appropriate output, graphs, and interpretations. Be sure to number each point 1-14. Project Part C: Grading Rubric Category Questions 1-12 and 14 5 points each Points % Description 65 65 addressed with appropriate output, graphs, and interpretations Question 13 15 15 Summary 20 Total 100 addressed with appropriate output, graphs, and interpretations 20 writing, grammar, clarity, logic, and cohesiveness A quality paper will meet or exceed all of the 100 above requirements

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Numerical Analysis

Authors: Richard L. Burden, J. Douglas Faires

9th edition

538733519, 978-1133169338, 1133169333, 978-0538733519

More Books

Students also viewed these Mathematics questions