Question
Using R studio code # Objectives: Calculate descriptive statistics, perform graphical analyses # and conduct statistical hypothesis tests. # Several challenges have questions. You do
Using R studio code
# Objectives: Calculate descriptive statistics, perform graphical analyses # and conduct statistical hypothesis tests.
# Several challenges have questions. You do not need to answer these # questions in your scripts. They are there to help you understand # what your analyses are telling you.
# Budget your time: # 30 minutes, read data & descriptive statistics (Challenges 1-6) # 45 minutes, graphical analysis (Challenges 7-19) # 45 minutes, ANOVA (Challenges 20-21) # 45 minutes, Multiple Regression (Challenges 22-25) # # If you're running behind, skip ahead. If you don't finish all # the challenges, that's OK. Just make a good effort and get as # far as you can.
# Six Sigma has two main methodologies: DMAIC and DMADV. # DMAIC = Define, Measure, Analyze, Improve, Control. # Typically used for improving existing processes. # DMADV = Define, Measure, Analyze, Design, Verify. # Typically used for creating new things, but may also be # used to improve existing processes if making major changes. # # The Measure phase is used to assess current performance and typically # includes descriptive statistics and graphical analyses. # The Analyze phase looks to identify and validate root causes of # performance and typically involves statistical hypothesis testing # (e.g., t-tests, ANOVAs, regressions). # The Improve/Verify phases typically involve validating that # certain improvements are having their desired operational impacts. # These are tested using DOE with its associated statistical hypothesis # testing methods.
# Challenge 0: Create a new script and add a multi-line comment # at the top with the name of the workshop, your name, and the date. # Save the script in your R script folder.
# MEASURE. In the Measure phase, we generate descriptive statistics # and perform graphical analyses to understand our data better.
# Challenge 1: Read "OR Cases.csv" into a data.frame.
# Challenge 2: Keep only the first 2 columns of the # data.frame. The remaining columns have only NA values.
# Challenge 3: Eliminate all rows of the data.frame # where the Date is blank (""). You should now have # a table with no blank rows or columns.
# Challenge 4: Convert the Date column to R dates.
# Challenge 5: For number of cases, calculate the summary statistics # of location: mean, median, min, max, 1st quartile, 3rd quartile. # Q: Is the distribution symmetric around the mean? # Tutorial: https://www.statmethods.net/stats/descriptives.html.
# Challenge 6: For number of cases, calculate the summary statistics # of spread: standard deviation, variance, range. # Q: Is the spread wide or narrow? # WARNING: Make sure the range = max - min.
# Challenge 7: (1) Plot a trend chart of OR cases over time. # (b) Make sure the x and y axes nave nice titles. # (c) Make sure the y axis has nice, round number endpoints. # (d) Add a title. # Q: Do you see a trend? Seasonality? Do you see stratification, # where the samples seem to be taken from two or more different populations?
# Challenge 8: (a) Plot a histogram of OR cases by day. # (b) Make sure the x- and y-axis scales have nice, round numbers. # (c) Make sure the bins have nice, round number breakpoints. # Q: Is the distribution symmetric around the mean? Is the spread # wide or narrow? Do you see stratification, where the samples seem # to be taken from two or more different populations?
# Challenge 9: Plot a histogram of OR cases by day, # this time as a probability density. Make sure # the x- and y-axis scales have nice, round numbers, # and make sure the bins have nice, round number # breakpoints.
# Challenge 10: Supplement your data.frame with three # new columns: day of week, week of year, month.
# Challenge 11: Add a new column, "Weekday", to the data.frame. # Initialize its value to "Weekday", even it the day falls on # a weekend. Print the first 10 lines.
# Challenge 12: Create a test of whether a day falls on a weekend. # Hint: https://stackoverflow.com/questions/11612235/select-rows-from-a-data-frame-based-on-values-in-a-vector # Hint: Use the is.element function. # Tutorial: Learn how to use R's built-in help: https://www.r-project.org/help.html. # In RStudio, if you type "?.is.element" (without the quotes) into the Console, # you will see the help in the Help pane. (If you don't see the Help pane, # go to the View menu and select Panes/Show All Panes.)
# Challenge 13: Use the test from the prior challenge to set the values in # the Weekday column to Weekend if the day falls on Sat or Sun. # Print the first 10 rows to make sure you're getting the correct results.
# NOTE: If you are running short of time, skip to Challenge 20.
# Challenge 14: Get the subset of data that falls on a week day. # Print the first 10 rows. # Hint: https://stackoverflow.com/questions/7381455/filtering-a-data-frame-by-values-in-a-column
# Challenge 15: Create a histogram of the number of week day OR cases. # Q: What distribution does the data seem to follow?
# Challenge 16: Create a histogram of the natural log of the number # of week day OR cases. # Q: Does the tranformed data look more "normal" than in the prior challenge?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started