Question
plots should be graphed using ggplot2 package. Graphs in base R won't work (1) Data set iris in R is a famous data set with
plots should be graphed using ggplot2 package. Graphs in base R won't work (1) Data set iris in R is a famous data set with the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are setosa, versicolor, and virginica. The variable names are Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width. You can access the data set by directly typing iris in R.
(a) Make a scatter plot of Sepal.Length (horizontal) versus Sepal.Width (vertical).
(b) Make a scatter plot of Sepal.Length (horizontal) versus Sepal.Width (vertical) with three different colors for three species. Hint: use color in aes.
(c) Make side-by-side box plots of Sepal.Length, one for each Species.
(d) Make the same box plots with horizontal boxes instead of vertical ones.
(e) Make a scatter plot matrix of all four continuous variables. Use ggpairs function in GGally package.
(f) Make a scatter plot matrix of all four continuous variables, distinguished by Species. That is, each species should have one color on each scatter plot.
(2) Fit a simple linear regression model of Sepal.Length (y) on Petal.Length (x). (a) Write down the model.
(b) Fit the simple linear regression model using lm.
(c) Report the estimated intercept and slope.
(d) Report the significance of the regression.
(e) Interpret the slope in the context of the problem.
(f) Make a scatter plot of these two variables. For the points, use pink-colored half-transparent squares with size 5 and red border.
(g) Add the fitted regression line in blue on top of the scatter plot in the previous step.
(h) Add another smooth line in green using method = loess in stat_smooth.
(i) What is the 95% confidence interval on the intercept.
(j) What is the 99% confidence interval on the slope.
(k) Predict the sepal length of the first iris flower in the data set whose petal length is 1.4.
(l) Make a confidence interval on the mean sepal length of iris flowers with petal length of 1.65.
(m) Make a prediction interval on the sepal length of a specific iris flower with petal length of 1.65.
(n) Make a residual plot. Do you observe any pattern that suggests the model is not adequate?
The data set Nile in R contains measurements of the annual flow of the river Nile at Aswan between 1871 and 1970.
(a) Make a time series plot using the geom_line function. You may need to create a data frame first before plotting.
(b) Change the appearance of the line plot in (a) in any way you want.
(4) The data set quakes in R give the locations of 1000 seismic events of MB > 4.0. Variable mag represents numeric Richter Magnitude. (a) Make a histogram using default settings.
(b) Change the number of bins using Sturges' formula and replot the histogram.
(c) Superimpose a blue-colored density plot to the histogram.
(d) The magnitude is between 4.0 and 6.4 in the data set. Add a new column to the data set which is an ordered factor with three levels: "Low" (4.0 mag < 5.0), "Medium" (5.0 mag < 6.0), and "High" (mag 6.0).
(e) Make side-by-side histograms of mag using facet_grid function, one for each level in (d).
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started