Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Spam Email: Using the email dataset from the openintro package, you are going to write a function that calculates the sampling distribution for the mean
Spam Email: Using the email dataset from the openintro package, you are going to write a function that calculates the sampling distribution for the mean of the number of line breaks in the email using the variable line_breaks. We will treat the email data as a complete census, so you will be subsampling from the population using the subsample. R function. On the Assignments page of the website, download the file subsample.R and move the file into a folder in your project named R. The function will take the values of the data.frame, the number of samples n, and the number of replicates of the experiment B. The function will return the sample mean for each of the B samples. For example, a single sample from the email data.frame is 12 library (openintro) data("email") source (here("R", "subsample.R")) ## sample size n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ to_multiple 1, 0, 0, 0, 0, 0, 1, 0, 1, 0 ## $ from 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ## $ cc 0, 1, 0, 4, 0, 0, 0, 2, 0, 1 ## $ sent_email 1, 0, 0, 0, 0, 0, 0, 1, 0, 1 ## $ time 2012-01-22 14:04:42, 2012-03-23 14:23:55, 2012-03-20 ... ## $ image 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 ## $ attach 0, 0, 0, 2, 0, 0, 0, 0, 2,0 ## $ dollar 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ winner no, no, no, no, no, no, no, no, no, no ## $ inherit 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ viagra 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ password 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ num_char 5.106, 5.355, 0.765, 0.341, 24.317, 38.071, 0.559, 4.9... ## $ line_breaks 198, 141, 16, 18, 620, 727, 15, 108, 141, 46 ## $ format 1, 1, 0, 0, 1, 1, 0, 1, 1, 0 ## $ re_subj 0, 1, 0, 0, 0, 0, 0, 1, 0, 1 ## $ exclaim_subj 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ urgent_subj 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ exclaim_mess 1, 4, 0, 0, 1, 29, 0, 0, 1, 1 ## $ number big, big, small, none, small, small, small, big, small... a) Write a function that returns the sample mean for B samples of size n. Hint: write a for loop first then put the loop in a function. The function inputs should be a data.frame (e.g. email), the number of replicates B, and the sample size n. b) Using your function, create three datasets with B = 10000 replicates of size n = 10, n = 50, and n=200. For each of the three sample sizes, create a histogram of the sample means. c) Describe what you see. What are the shapes of the histograms? Are there any trends in the shape as n increases
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started