Answered step by step
Verified Expert Solution
Question
1 Approved Answer
1 . Start a new R Markdown file ( . Rmd ) as you learned in the previous module. Copy the following YAML header and
Start a new R Markdown file Rmd as you learned in the previous module.
Copy the following YAML header and the first code chunk to set up the global environment for knitting. You must submit the HTML report for me to grade your work. I will use your HTML file as a primary document for grading. The YAML header and the first code chunk below allow you to organize your work well knit the Rmd file even when there might be an error.
title: "Final Exam with two data sets"
author: "Jae Jung"
date: r Systime
output:
htmldocument:
toc: yes
tocdepth:
highlight: espresso
theme: journal
r setup, includeFALSE
knitr::optschunk$setecho TRUE,
error TRUE
Save the file as appropriately, starting with your name eg "Jung, JaeFinals.Rmd in the "Test" folder.
Implement the following tasks in the R Markdown file.
Do not forget to copy and paste your questions and create code chunks in your R Markdown file for each question. For example, for this assignment, you must have code chunks You have questions Insert your code chunk below each question.
Part : Airqualty Data
Questions through will be about the same dataset available in the R program: "airquality."
# Question : Get a local copy of the dataset "airquality" and name it df so that you can use it later. Next, show the first rows of it Pay attention to the names of the variables. Write a code that reveals how many variables and observations are in the dataset. Also, write a code that gives you some basic descriptive statistics. You will notice that two variables have missing values.
# Question : Write the codes that tell you where the missing values are located, the number of missing values in the dataset df the number of missing values in the Solar.R column, and all the rows that include at least one missing value. Lastly, write the code that returns the number of rows that include at least one missing value. Hint: there are rows that have more than one missing value.
# Question : Replace all the missing values in the Solar.R column with the median of the values in the column. Also, get the standard deviation and average of all columns.
# Question : The goal is to create a new column filled with "low", "average", and "high" based on information from Ozone and Solar.R columns.
Create a new column called "newCol," which is full of NA values.
If both values in the first two columns ie Ozone and Solar.R of the df dataset in each row are less than the average of the respective columns, put Low in the new column, if they are the same as the averages, put "Average," and if both values are greater than averages, put high in the new column use the pipe operator
Hint: You will need to replace the missing values on the Ozone column with the mean of the column before creating the new variable.
# Question : Rename the column "newCol" to "AirRate". Find a pair of variables with the highest and lowest correlation in df and assign it to the variables highestcor and lowestcor.
Part : Gapminder Data
Questions through will be about the same dataset available from the "gapminder" package that you may have to install and load up to be able to use.
# Question : From the "gapminder" dataset, select the columns, "country", "continent", "year", and "lifeExp" and save the subset of gapminder data as "data." Tidy this data set using the pipe operator such that there is only one country in each row and many years in the columns and life expectancy as a value for year columns. Save this new tidy data as "widedata" should contain columns in the end
Hint: Since the data is not builtin in the R or RStduio, you would need to install the package called "gapminder" and then load it up to the computer's shortterm memory.
# Question : Choose only the cases for the USHint: rows Next, pipe the data into plotting a line chart with the variable "year" and "lifeExp" on the xaxis and yaxis, respectively. Improve the legibility of the chart. First, label the variables on the chart as "Years" for the xaxis and "Life Expectancy" for the yaxis. Next, provide the chart with the title "Life Expectancy over Years in the United States" and set the line color to red.
# Question :
From the data set "gapminder," use the data for the most recent year only. Next, create a chart that shows the life expectancies by continent. Make the plot as beautiful and professional as you can. This includes adding colors to the bars and giving appropriate labels for the title, xaxis, and yaxis. What can you tell about the pattern of life expectancies across the continents? Hint: Due to a large number of countries in the data, using countries as x or yaxis will be problematic in making the charts interpretable.
This
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started