Question

1 Approved Answer

Posted on Jun 11, 2024

THIRD QUESTION ### Problem 3: Someone left strings in your numeric column! This exercise will give you practice with two of the most common data

THIRD QUESTION ### Problem 3: Someone left strings in your numeric column! This exercise will give you practice with two of the most common data cleaning tasks. For this problem we'll use the `survey_untidy.csv` data set posted on the course website. Begin by importing this data into R. The url for the data set is shown below. url: http://www.andrew.cmu.edu/user/achoulde/94842/data/survey_untidy.csv In Lecture 4 we look at an example of cleaning up the TVhours column. The TVhours column of `survey_untidy.csv` has been corrupted in a similar way to what you saw in class. Using the techniques you saw in class, make a new version of the untidy survey data where the TVhours column has been cleaned up. (Hint: *you may need to handle some of the observations on a case-by-case basis*) ```{r} # Edit me ``` ### Problem 4: Shouldn't ppm, pPM and PPM all be the same thing? This exercise picks up from Problem 3, and walks you through two different approaches to cleaning up the Program column ##### (a) Identifying the problem. Use the `table` or `levels` command on the Program column to figure out what went wrong with this column. Describe the problem in the space below. ```{r} # Write your code here ``` **Description of the problem:** Replace this text with your answer. (do not delete the html tags) ##### (b) `mapvalues` approach Starting with the cleaned up data you produced in Problem 3, use the `mapvalues` and `mutate` functions to fix the Program column by mapping all of the lowercase and mixed case program names to upper case. ```{r, message = FALSE} library(plyr) library(dplyr) # Edit me ``` ##### (c) `toupper` approach The `toupper` function takes an array of character strings and converts all letters to uppercase. Use `toupper()` and `mutate` to perform the same data cleaning task as in part (b). ```{r} # Edit me ``` **Tip**: *The `toupper()` and `tolower()` functions are very useful in data cleaning tasks. You may want to start by running these functions even if you'll have to do some more spot-cleaning later on.* ### Problem 5: Let's apply some functions ##### (a) Writing trimmed mean function Write a function that calculates the mean of a numeric vector `x`, ignoring the `s` smallest and `l` largest values (this is a *trimmed mean*). E.g., if `x = c(1, 7, 3, 2, 5, 0.5, 9, 10)`, `s = 1`, and `l = 2`, your function would return the mean of `c(1, 7, 3, 2, 5)` (this is `x` with the 1 smallest value (0.5) and the 2 largest values (9, 10) removed). Your function should use the `length()` function to check if `x` has at least `s + l + 1` values. If `x` is shorter than `s + l + 1`, your function should use the `message()` function to tell the user that the vector can't be trimmed as requested. If `x` is at least length `s + l + 1`, your function should return the trimmed mean. ```{r} # Here's a function skeleton to get you started # Fill me in with an informative comment # describing what the function does trimmedMean <- function(x, s = 0, l = 0) { # Write your code here } ``` **Hint:** *For this exercise it will be useful to recall the `sort()` function that you first saw in Lecture 1.* **Note:** The `s = 0` and `l = 0` specified in the function definition are the default settings. i.e., this syntax ensures that if `s` and `l` are not provided by the user, they are both set to `0`. Thus the default behaviour is that the `trimmedMean` function doesn't trim anything, and hence is the same as the `mean` function. ##### (b) Apply your function with a for loop ```{r, fig.width = 12, fig.height = 4} set.seed(201802) # Sets seed to make sure everyone's random vectors are generated the same list.random <- list(x = rnorm(50), y = rexp(65), z = rt(100, df = 1.5)) # Here's a Figure showing histograms of the data par(mfrow = c(1,3)) hist(list.random$x, breaks = 15, col = 'grey') hist(list.random$y, breaks = 10, col = 'forestgreen') hist(list.random$z, breaks = 20, col = 'steelblue') ``` Using a `for loop` and your function from part **(a)**, create a vector whose elements are the trimmed means of the vectors in `list.random`, taking `s = 5` and `l = 5`. ```{r} # Edit me ``` ##### (c) Calculate the un-trimmed means for each of the vectors in the list. How do these compare to the trimmed means you calculated in part (b)? Explain your findings. ```{r} # Edit me ``` **Explanation:** Replace this text with your answer. (do not delete the html tags) ##### (d) lapply(), sapply() Repeat part **(b)**, using the `lapply` and `sapply` functions instead of a for loop. Your `lapply` command should return a list of trimmed means, and your `sapply` command should return a vector of trimmed means. ```{r} # Edit me ``` **Hint** `lapply` and `sapply` can take arguments that you wish to pass to the `trimmedMean` function. E.g., if you were applying the function `sort`, which has an argument `decreasing`, you could use the syntax `lapply(..., FUN = sort, decreasing = TRUE)`. Need help with these problems in Jupiter notebook as r s forma