Question

1 Approved Answer

Posted on Oct 13, 2024

Dong Lee Math-138 Professor Deshpande 10-16-16 Project Part 2 Research Proposal: Introduction: The first variables I plan to study are the hours you spend on

Dong Lee Math-138 Professor Deshpande 10-16-16 Project Part 2 Research Proposal: Introduction: The first variables I plan to study are the hours you spend on the smartphone daily and the monthly phone bill you get. The second variable I plan to study are the mileage of the car and number of visit to auto repair shop. The quantitative variables I plan to study are related to each other. First variables are related because the more you spend you time on your smartphone the more you use the data and when you use more data you pay for higher data plan which cost more on your phone bills. Second quantitative variable are related because the higher mileage cars have more problems mechanically and got to get it fixed and go to auto repair shops. I think the hours you spend on the smartphone and number of mileage of the car might cause a change in the other variables. I think my research is important and interesting because you could find out the relationship between variables and it could help you lower your phone bills and know the average mileage of the cars. Methods: I am intended to gather my data in three groups. First I am going to ask students in the classroom after class. Second I am going to ask co-workers at the work on break time. Lastly I whill ask my friends by calling or texting them anytime during the day. There is no ethical issue I have to anticipate with my research. Materials: First Survey Question \"How many hours do you spend on your smartphone?\" \"How much monthly phone bills do you get?\" Second Survey Question \"How many miles do you drive per year?\" \"How many times did you visit to auto repair shop over the past year?\" Data: Hours Spent of your phone (24 hours) Student 1 Student 2 Student 3 Student 4 Student 5 co-worker 1 co-worker 2 co-worker 3 co-worker 4 co-worker 5 Friedn 1 Friedn 2 Friedn 3 Friedn 4 Friedn 5 Friedn 6 Friedn 7 Friedn 8 Friedn 9 Friedn 10 Monthly Phone Bill 3.5 5 2.5 3.5 5 8 5 4 4 4.5 3 2.5 5 4 6 7 6 4.5 8 8 75 85 50 70 80 110 85 75 80 80 60 55 75 75 80 85 85 70 90 100 Miles you drive per year Student 1 Student 2 Student 3 Student 4 Student 5 co-worker 1 co-worker 2 co-worker 3 co-worker 4 co-worker 5 Friedn 1 Friedn 2 Friedn 3 Friedn 4 Friedn 5 Friedn 6 Friedn 7 Friedn 8 Friedn 9 Friedn 10 5800 15000 13800 7500 20000 20500 9600 14600 13000 6000 20500 31000 8000 11000 17000 7400 12500 18600 13400 17600 Visit to autorepair shop per year 2 4 4 2 5 6 3 4 3 2 4 8 3 4 5 2 4 5 3 4 Distribution of Quantitative Data: The mean, standard deviation, 5 number summary, histogram and box plot for Hours Spent of your phone (24 hours) is given below. mean sd Min Q1 Q2 Q3 Max q1 - 3*iqr q3+3*iqr 4.95 1.7388744 1 2.5 3.875 4.75 6 8 -2.5 12.375 Total 35 30 25 20 15 10 5 0 2.5-3.5 3.5-4.5 4.5-5.5 5.5-6.5 6.5-7.5 7.5-8.5 Boxplot 0 1 2 3 4 5 6 7 8 9 I observe that the distribution for Hours Spent of your phone (24 hours) is approximately normally distributed. For a normal distribution, mean is the best measure of central tendency. From the boxplot I observe that there are no outliers in the data. Since median lies in the middle of Q1 and Q3, I can say that data is normally distributed. Since data doesn't have any outliers range would be the suitable measure of dispersion. The mean, standard deviation, 5 number summary, histogram and box plot for Monthly Phone Bill is given below. mean sd Min Q1 Q2 Q3 Max 78.25 13.88628631 50 73.75 80 85 110 q1 - 3*iqr q3+3*iqr 40 118.75 Boxplot 40 50 60 70 80 90 100 110 120 Total 800 600 Total 400 200 0 50-59 60-69 70-79 80-89 90-99 100-110 I observe that the distribution for Monthly Phone Bill is approximately normally. For a normal distribution, mean is the best measure of central tendency. From the boxplot I observe that there are no outliers in the data. Since data doesn't have any outliers range would be the suitable measure of dispersion. Z-Scores and the Normal Distribution: I choose Monthly Phone Bill, as it has approximately normal distribution. The z scores of minimum and maximum values are given below. X 50 110 min max z score = (Xmean)/sd -2.034381215 2.286428445 I choose x = 60. The corresponding data score is given as: x 60 z score 31.658410 6 % of points below z score =p(z< -1.3142) =NORMSDIST(-1.3142463) 0.094381666 Linear Regression and Correlation My independent variable is Hours Spent of your phone (24 hours) and dependent variable is Monthly Phone Bill. I want to predict my monthly bill on basis of Hours Spent of phone. Scatterplot: Monthly Phone Bill 120 100 f(x) = 7.11x + 43.07 R = 0.79 80 monthly phone bill 60 40 20 0 2 3 4 5 6 7 8 9 hours spent on phone An upward trend is observed from the scatterplot. This implies there is strong positive linear relationship between Hours Spent of phone and monthly bill. That is as the value of Hours Spent of phone increases, the value of monthly bill also increases. Correlation Coefficient and Linear Regression: The correlation coefficient is 0.889 using excel inbuilt function CORREL. This implies there is strong positive linear relationship between Hours Spent of phone and monthly bill. The same is observed from the scatterplot. The regression equation as calculated from regression function under data analysis tool pack, is Monthly_phone_bill = 43.07 + 7.1061*hours_spent Discussion: From the value of slope I conclude that with a unit increase in hours spent on phone there is 7.1061 units increase in phone bill. Hence I can say that if reduce at least one hour per day, there would be 7.1061*30 = $213.18 less bill in the corresponding month. Bias is the tendency of a sample statistic to systematically over-estimate or under-estimate a population parameter. Bias is a systemic inaccuracy in data which might be due to creation and collection of data, or due to faulty sample design. If respondents answer questions in a way they think the questioner wants them to answer rather than according to their true beliefs, is referred as Response Bias. I have taken care to avoid response biasness in my data. A stratified sampling is a technique in which the entire population is divided into relatively homogeneous subgroups or strata, and then final units are selected randomly from these strata's. Here I have three strata, students in the classroom, coworkers at the work and my friends. I randomly select units from these strata and conduct my survey. First I am going to ask students in the classroom after class. Second I am going to ask co-workers at the work on break time. Lastly I will ask my friends by calling or texting them anytime during the day. Technology Considerations: For the construction of histogram I use pivot table and charts in excel. And for boxplots I use Box plot under descriptive statistics in PhStat. These are simplest ways to draw histogram and Box plot, hence I use them. I use scatter under chart options under 'insert' option in excel for construction of scatterplot. For finding the value of correlation coefficient, I use 'CORREL'. For performing regression analysis, I use regression function under the data analysis tool pack. All technology would give me the same result, it's just easy to use excel functions and hence I use it. Column1 Hours Spent of your phone (24 hours) Monthly Phone Bill Student 1 3.5 Student 2 5 Student 3 2.5 Student 4 3.5 Student 5 5 co-worker 1 8 co-worker 2 5 co-worker 3 4 co-worker 4 4 co-worker 5 4.5 Friedn 1 3 Friedn 2 2.5 Friedn 3 5 Friedn 4 4 Friedn 5 6 Friedn 6 7 Friedn 7 6 Friedn 8 4.5 Friedn 9 8 Friedn 10 8 75 85 50 70 80 110 85 75 80 80 60 55 75 75 80 85 85 70 90 100 Student 1 Student 2 Student 3 Student 4 Student 5 co-worker 1 co-worker 2 co-worker 3 co-worker 4 co-worker 5 Friedn 1 Friedn 2 Friedn 3 Friedn 4 Friedn 5 Friedn 6 Friedn 7 Friedn 8 Friedn 9 Friedn 10 Miles you drive per yearVisit to autorepair shop per year 5800 2 15000 4 13800 4 7500 2 20000 5 20500 6 9600 3 14600 4 13000 3 6000 2 20500 4 31000 8 8000 3 11000 4 17000 5 7400 2 12500 4 18600 5 13400 3 17600 4 Project Part III: Statistical Testing (Inferential Statistics) points total) (38 In this section you will expand your project to explore inferential statistics for ONE of your quantitative variables: the one you decided in Part II was more nearly normal than the other. This section will include some external research, hypothesis testing and confidence intervals, and a discussion of hypothesis testing errors and issues. Your project should be submitted as a professional report including everything from Part I and II using the following headings: Research Proposal: all sections from Part I Data and Descriptive Statistics: all sections from Part II Part III: Statistical Testing -Research: Remind your reader which quantitative variable you considered most \"nearly normal\" in distribution and conduct some external research using the internet, library, or other academic sources to propose a reasonable guess for the true population average (mean) and standard deviation for your variable of interest. Be sure to remind your readers what population of interest you are referring to: for example, are you limiting your consideration to adult U.S. citizens, adults worldwide, U.S. college students, dogs owned as pets, etc. Describe the study/resource and why the proposed values are reasonable estimates for the population parameters (true mean and standard deviation). Cite your sources appropriately using APA style in-text citations or the Chicago Manual of Style footnotes and add a "References Sheet" on the last page. This section should be 1-2 paragraphs in length. **If you are able to obtain information about a plausible population mean but not a standard deviation, please use your sample standard deviation as an estimate. Sampling Distribution: Recall that the sampling distribution of means for a nearly normal variable should be Normal, with a mean equal to the population mean, and a standard deviation equal to the population standard deviation divided by the square root of your sample size. Naturally the true population parameters (mean and standard deviation) for your variable is unknown, however you identified a reasonable estimate in the section above. Based on these estimates, give the sampling distribution model that applies to your variable. Then use this sampling distribution to find the probability of seeing your data (sample mean) or more extreme by chance. Explain every step of your work and include a diagram of the sampling distribution normal curve and your observed sample statistic (use appropriate technology). One Sample Inferential Statistics: Suppose you suspect that the population mean found through your research is inaccurate and want to test the hypothesis that the population mean has changed (is different from the population mean found in your research). Since you are using quantitative data, you will be doing a 1 Mean Hypothesis Test (T-Test). You MUST do the following: 1-Mean Hypothesis Test Write your hypotheses using appropriate notation. Evaluate whether or not the conditions are met for you to conduct your test (you must actually verify all conditions - include diagrams or computations as applicable). o Use technology to conduct your test and professionally present your results. o o If your conditions are NOT met, you can proceed but you MUST discuss the implications of not having met conditions in your Discussion section. Indicate how you arrived at your results using that technology (i.e. give instructions, indicate the inputs, present screenshots if applicable). Present your results. Write a conclusion to your hypothesis test in the context of your research question using appropriate statistical terminology 1-Mean Confidence Interval Use technology to create a 95% confidence interval for the true population mean. o Show how you arrived at your results using that technology. o Respond to the question: Are the results of your confidence interval consistent with your hypothesis test? Why or why not? Discussion: Provide a detailed discussion of the potential issues and limitations of the results of your research into this one variable. Your discussion should answer all of the following questions (at a minimum!): How confident are you that your results are accurate and meaningful? Are your results statistically significant? How about practically significant (you'll need to refer to your confidence intervals to answer this question). What limitations should someone consider when looking at your research; for example: how well did the T-Test model apply to each test (how well were the conditions met) and how representative were your data. Discuss which potential error could have occurred in your research: Type I or Type II error and what it means in the context of your scenario. Suggest some possible reasons that this error might have occurred and what consequences might result from this error. What other problems do you see with your research that might limit how well your research generalizes to the greater population? Grading Rubric: Part III will be graded by components according to the following guidelines with comments provided to students. Research (4 points total) 4 Points 3 Points External resource is Proposed guess for clearly identified the true population and explained. mean is consistent with the external reference, but some details are missing. Proposed guess for the true population mean is justified and consistent with the Research findings external reference are based on used. somewhat legitimate sources and are referenced appropriately. Research findings are based on legitimate sources and are referenced Response is at times appropriately in-text difficult to and/or using understand, or at footnotes and a times uses 2 Points 1 Point Some evidence of external research is present. However, proposed guess for the true population mean is inconsistentwith the research findings or is not explained and/or justified. No evidence of No submi external research or submissio research is not plagiarize referenced at all. submissio match as Referencing skills need work. 0 Points reference page. statistical terminology inappropriately. Response is wellwritten, easy to understand, and uses correct statistical terminology. Sampling Distribution (SQR Criterion #2: 4 points total) Must identify the Sampling Distribution of the Mean and use this Sampling Distribution to find the probability of obtaining the sample mean (or more extreme) by chance. 4 Points 3 Points 2 Points 1 Point Sampling distribution is correctly identified. Sampling distribution is correctly identified. Uses incorrect strategies Cannot determine No submission, any strategy to submission is solve the problem. plagiarized, or submission doe not match assignment. OR Uses correct strategy to find the probability of the sample mean and expresses answers correctly. Diagram is included and is correct. Uses correct strategy but makes minor mistakes in finding or expressing answers (e.g. the probability of the sample mean). Uses correct strategy but problem solving process contains inaccuracies (such as sampling distribution not being correctly identified or other Diagram is missing major conceptual or or is inaccurate. procedural errors). 0 Points 1-Mean Hypothesis Test and 1-Mean Confidence Interval - SQR Criterion #1: Communicates mathematical and/or scientific concepts using appropriate symbols, notations and vocabulary. (4 points) 4 Points 3 Points 2 Points Hypotheses, checking assumptions, procedural steps, explanations, and conclusions use appropriate vocabulary, symbols, and notation with no errors. Hypotheses, checking assumptions, procedural steps, explanations, and conclusions Hypotheses, checking assumptions, procedural steps, explanations, and conclusions make poor or incorrect use of use appropriate vocabulary or vocabulary, contains many symbols, and errors. notation with no significant mistakes and/or with minimal inaccuracies. 1 Point 0 Points Hypotheses, checking assumptions, explanations, and conclusions are absent. No submission, submission is plagiarized, or submission does not match assignment. No use of statistical vocabulary and/or notation. 1-Mean Hypothesis Test and 1-Mean Confidence Interval - SQR Criterion #2: Applies appropriate process to solve the given problem. (4 points) 4 Points Hypothesis test and confidence interval 3 Points Hypothesis test and confidence interval computations computations use correct use correct strategy but strategy to makes minor find and mistakes in express finding or answers expressing correctly. answers. 2 Points 1 Point 0 Points Hypothesis test and confidence interval computations use incorrect strategies OR Uses correct strategy but problem solving process contains inaccuracies. Cannot determine any strategy to solve the problems. No submission, submission is plagiarized, or submission does not match assignment. 1-Mean Hypothesis Test and 1-Mean Confidence Interval - SQR Criterion #3: Analyzes, evaluates, justifies and interprets the reasonableness of a solution. (4 points) 4 Points 3 Points 2 Points 1 Point 0 Points Conclusions demonstrate accurate evaluation and interpretation of the results of the hypothesis test and confidence interval. Conclusions demonstrate accurate evaluation and interpretation of the results of the hypothesis test and confidence interval. Conclusions demonstrate insufficient or incorrect evaluation or interpretation of the results of the hypothesis test and/or confidence interval. Conclusions are not justified and/or results of the hypothesis test or confidence interval are not interpreted correctly. No submission, submission is plagiarized, or submission does not match assignment. Confidence interval results are used effectively to justify the reasonablenes s of the results of the hypothesis test and to form thoughtful and insightful conclusions. Confidence interval results are used to justify the reasonablenes s of the results of the hypothesis test. Confidence interval results are not correctly used to justify the reasonablenes s of the results of the hypothesis test. Some nuances in interpretation may be missed. Effective Use of Technology -- TC Criterion #2: Using, adapting or designing technologies to achieve the best results for research, communication or task-related objectives (4 points) 4 Points 3 Points 2 Points 1 Point Multiple, complementary technologies utilized to yield outstanding results, demonstrating advanced command of each of the tools. Multiple technologies used appropriately to complete project tasks in a manner sufficient to meet project goals, demonstrating proficiency in utilizing multiple tools. Project utilizes No technology some technology, used. but does not completely or consistently meet project goals. 0 Points No submissio submission i plagiarized, submission d not match assignment. Discussion (10 points) A complete response should address: statistical and practical significance, which type of error could have occurred, and a thorough exploration of limitations including the impact of sampling issues and how well the test/interval conditions were met. 10 Points 8 Points 6 Points 4 Points All questions are All questions are answered completely. answered. Most questions are Several questions answered. were left unanswered or responses contain major errors. Responses are wellResponses are Responses at written, easy to fairly well-written, times contain understand, and use and use mostly errors, are difficult correct statistical correct statistical to understand, Response terminology. terminology. and/or use demonstrates a incorrect statistical limited grasp of terminology. statistical concepts. Discussion of Discussion of significance, type and significance, type sources of error, and and sources of Discussion of research error, and significance, type research and sources of limitations is error, and thoughtful, correct limitations is research and/or reasonable mostly correct/ and demonstrates an reasonable but limitations are excellent grasp of may contain minor mostly correct/ statistical concepts. errors and reasonable and demonstrates a demonstrates a good grasp of fair grasp of statistical statistical concepts. concepts. Professionalism (3 points) 3 Points 2 Points Report is highly professional in appearance, typed using an appropriate mathematical typesetting program, easy to read/comprehend , and complete. Report demonstrates some professionalism, but contains distracting errors/problems in typesetting. 1 Point Report demonstrates some professionalism, but problems with organization, typesetting, and/or grammar Formatting/organizati make it difficult on, vocabulary, and/or grammar 0 Points Report does not demonstrate appropriate professionalism as required by the assignment. 2 Points Most question were left unanswered o responses are largely incorre do not make sense. Response demonstrates lack of understanding the relevant statistical concepts. need attention. Completeness (1 point) 1 Point 0 Points All required sections are present. Project is missing Parts I and II or other major sections. Parts I and II are included in the final project submission and are compatible (they all relate to one another). OR No submission / submission is plagiarized. to read