Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

RStudio Version Overview Week 4 Interpret. The last step is to interpret the results in everyday terms. APA formatting is not required. Submit this worksheet

RStudio Version Overview Week 4 Interpret. The last step is to interpret the results in everyday terms. APA formatting is not required. Submit this worksheet for team collaboration. Your teammate will provide a review of your work. Note. This paper is lengthy because the step-by-step examples are included. 1 RStudio Version Week 3 Descriptive Statistics 1. Variables and Research Question Write in the Independent (IV) and Dependent (DV) variables. Identify the level of measurement, Categorical or Numeric, with an X. Variable Names IV: Vehicle Origin Categorical Numeric X DV: Age X Write your Research Question Research Question: RQ: Is there a difference in a vehicle buyer's age (DV) based on vehicle origin (IV)? 2. Review the CSV Data Raw data needs to be thoroughly reviewed before analysis takes place. The three primary areas to review are missing values, the limits (end points), and extreme outliers. These will be explained below. Missing Values This problem occurs when the respondent fails to complete all of the survey questions. How to handle this problem depends on the data and its use. The below criteria are for academic simplicity. Missing values will not be replaced. Delete rows with one or more missing values. A. Where any data rows removed? Explain. No. All the data was complete out of the dataset of 80. Evaluate the limits of your numeric data. Criteria Remove data outside of practical limits. Do the minimum and maximum values fit the scope of the Research Question? If not, remove those values that are inconsistent. B. Were any data removed? Explain. No. Data resides within the limits of the research. RStudio Version Evaluate Outliers Extreme outliers are the greatest concern against data normality. These can be explainable, such as home prices in certain areas. There can also be data entry errors. Criteria for this academic project. Eliminate the minimum and maximum extreme outlier(s) Note. Document the outlier values that were deleted. C. What extreme outliers were found? If yes, list their values. Were they removed? Q1 - 3*(IQR) = 29 - 3(11.50) = - 5.5 25 > -5.5 No extreme outlier Q3 + 3*(IQR) = 55 + 3(11.50) = 89.5 58 < 89.5 No extreme outlier Categorical Data Criteria Remove categories with less than five values. Alternatively, combine them with another category. This practice is for mathematic purposes. D. Were changes made to the categorical data? Explain. No. The category (vehicle origin) used here for the IV contains data for 80 scores in a sample population. Establish a Histogram Replace new.var with your numeric working variable name Replace x-axis label with an appropriate x-axis label name hist (new.var, main = NULL, xlab = \"x-axis label\") # Creates a histogram Copy and paste the code in the RStudio. Ctrl and Enter. Click on Zoom. Right click and Copy image. Paste the bar chart to Appendix A in this paper. E. What did your observation tell you about the numeric variable? The histogram shows there is a slight positive (right) skew. Bar Chart. Create a bar chart for each categorical data variable with RStudio. RStudio Version Note. Two steps, one to create the chart and another to create a frequency table. Establish a Bar Chart and Frequency Table Replace School_Buses with your dataset name. Replace Type with for your categorical variable column name. Replace \"Number of Buses\" with an appropriate x-axis title name. Replace new.var with your categorical working variable name Barplot (table (School_Buses $ Type), col = "black", xlab = "Number of Buses", ylab = "Frequency") table (new.var) Copy and paste into RStudio. Ctrl and Enter. Click on Zoom. Right click and Copy image. Paste the bar chart to Appendix B in this paper. F. What did your observation tell you about the categorical variable? Output Domestic Import 30 50 There are 66% more import auto sales (50) than domestic auto sales (30). Scatterplot. Create a scatterplot with paired (x, y) numeric data. G. What did your observation tell you about the relationship between these two variables? Not required with the Auto Sales dataset. Normality. Normality can be determined mathematically. Knowing this will define how the data is interpreted, and the hypothesis test. We will use the Shapiro-Wilk normality test here, but there is also the Anderson-Darling. For both tests, a p-value is greater than .05 indicates the data is sufficiently normal. RStudio Version Use Shapiro-Wilk Normality Test Replace new.var with your numeric working variable name shapiro.test (new.var) Copy and paste, Ctrl Enter Copy and paste your output here. H. What was your observation for normality? Explain. Shapiro-Wilk normality test data: age W = 0.97955, p-value = 0.2287 The variable age is normally distributed because the p-value (.2287) is more than .05. 5. Descriptive Statistics - Calculations We are ready to calculate the descriptive statistics and complete Table 1 (see below). Format: Desc(new.var) Descriptive Statistics Replace new.var with your numeric working variable name Desc (new.var) Copy and paste, Ctrl Enter Place the output and graph in Appendix D. Complete Table 1 Table 1 Descriptive Statistics Numeric Variables Statistic Shapiro-Wilk, p-value = Normal (Yes/No) Age .2287 Yes Mean, M = 42.15 Median = 42.50 Not Applicable RStudio Version Std. Dev., s = IQR 2 = 8.22 11.50 /2 = 5.75 Sample Size, n = 80 Minimum = 25 Maximum = 58 Mode = Confidence Interval = 51 (40.32, 43.98) Note. The confidence interval is applicable for normally distributed data. Summary Week 3. The Week 3 Descriptive Statistics is complete. Submit the Word document and your CSV file. These will be reviewed and returned as soon as possible for Week 4 continuation. Full points when submitted on time and content in the final draft state. RStudio Version Week 4 Descriptive Statistics 6. Descriptive Statistics - Interpretation Continue from Week 3. Correct the content based on the Week 3 feedback Create interpretations for your data. Example data in this section is from the Example calculations in Week 3. Descriptive Statistics Interpretation All the data was complete out of the dataset of 80 and the data resides within the limits of the research. There are no extreme outliers in the dataset. No changes were made to the categorical data. Numeric Variable Name The numeric variable's name is the Buyer's Age. The histogram shows there was a slight positive (right) skew. (See Appendix A). Slightly Skewed Data (Positive) Age The data is slightly skewed in the positive direction. Eighty vehicle purchases were randomly chosen. The buyers ages ranged between 25 and 58 years. One-half or more people were 43 years or older with a variation of plus or minus eight years. The most frequent buyers age was 51 years. There is a 95% confidence that the average buyers age was between 40 and 44 years old. (See Appendix D). Note: There are eight measures listed: Sample size, minimum, maximum, mean, median, standard deviation, IQR/2, and confidence level. The mode was not used for numeric-continuous data. Normality Shapiro-Wilk normality test data: age W = 0.97955, p-value = 0.2287 The variable of the buyers age is normally distributed because the p-value (.2287) is more than alpha (.05). This means the null hypothesis is not rejected and there is a strong indicator from this sample that there is no direct correlation between a buyers age and the choice of vehicle origin. RStudio Version Categorical Variable Name The categorical variables name is the Vehicle Origin. The origin components are Import and Domestic. No changes were made to the categorical data since 80 scores represent the sample population. Vehicle Origin The frequency table results indicate the following: Output Domestic Import 30 50 Eighty vehicle purchases were randomly chosen. Fifty (63) percent were import, and thirty (37) percent were domestic in origin. (See Appendix B). Next Steps 1. Post your paper to the team by midnight Day 5 for member review. 2. Review member papers and post your critique by midnight Day 6. 3. Make corrections to this paper based on the critique and member collaboration. 4. Update your Business Research Paper (time permitting) by midnight Day 7. 5. Submit this paper and your created member review papers by midnight Day 7. RStudio Version References CDC. (2016, August). Anthropometric reference data for children and adults: United States, 2011-2014. [Table 11]. Retrieved from https://www.cdc.gov/nchs/fastats/bodymeasurements.htm RStudio Version Appendix A: Histogram (s) RStudio Version Appendix B: Bar Chart (s) RStudio Version Appendix C: Scatter Plot Not applicable. RStudio Version Appendix D: Descriptive Statistics and Chart Output age (integer) length n NAs unique 0s mean meanCI 80 80 0 30 0 42.15 40.32 100.0% 0.0% 0.0% 43.98 .05 .10 .25 median .75 .90 .95 28.95 30.90 35.75 42.50 47.25 53.00 56.00 range 33.00 sd vcoef 8.22 0.19 mad IQR skew kurt 8.15 11.50 -0.08 -0.86 lowest : 25, 26, 28 (2), 29 (2), 30 (2) highest: 51 (6), 53 (3), 55 (2), 56 (4), 58 Descriptive Statistics Who are you reviewing: Submit this Word document. Calculations must be shown in Excel or in this paper. Descriptive Statistics A. Descriptive Statistics Charts a. A histogram for each numeric - continuous data was created. Posted in the Appendix. b. A scatter plot was made when there were two numeric data variables. Posted in Appendix. c. A bar chart for each attribute and numeric-discrete data variable was created. Posted in Appendix. d. Were x-axis and y-axis labels explanatory for the above charts? Normal Data a. Was a normality test made for each numeric variable? Posted in the Appendix. b. What was the Normality Test p-value correctly stated? c. Was the data normally distributed (p-value greater than .05) correctly stated? Table 1 a. Was the table correctly populated based on Descriptive Statistics output. Comment on those measures that are incorrect Note. The confidence interval is only used when the data is normally distributed. B. Interpret Descriptive Statistics Numeric Variable Name a. Was the numeric data described in everyday terms. Were there four to five sentences per variable? Categorical Variable Name a. Was the categorical data described by proportions (percentages). C. Other feedback

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Elementary Linear Algebra with Applications

Authors: Howard Anton, Chris Rorres

9th edition

471669598, 978-0471669593

More Books

Students also viewed these Mathematics questions

Question

Why is a joint venture important for a new company? Explain.

Answered: 1 week ago