Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

. Implement the following tasks in the R Markdown file. Do not forget to copy and paste your questions and create code chunks in your

.Implement the following tasks in the R Markdown file.
Do not forget to copy and paste your questions and create code chunks in your R Markdown file for each question. For example, for this assignment, you must have 10 code chunks (You have 10 questions). Insert your code chunk below each question.
Part 1: Airqualty Data
Questions 1 through 5 will be about the same dataset available in the R program: "airquality."
# Question 1: 1) Get a local copy of the dataset "airquality" and name it "df" so that you can use it later. (2) Next, show the first 7 rows of it. Pay attention to the names of the variables. (3) Write a code that reveals how many variables and observations are in the dataset. (4) Also, write a code that gives you some basic descriptive statistics. You will notice that two variables have missing values.
# Question 2: Write the codes that tell you (1) where the missing values are located, (2) the number of missing values in the dataset (df),(3) the number of missing values in the Solar.R column, and (4) all the rows that include at least one missing value. (5) Lastly, write the code that returns the number of rows that include at least one missing value. Hint: there are rows that have more than one missing value.
# Question 3: (1) Replace all the missing values in the Solar.R column with the median of the values in the column. (2) Also, get the standard deviation and average of all columns.
# Question 4: The goal is to create a new column filled with "low", "average", and "high" based on information from Ozone and Solar.R columns.
(1) Create a new column called "newCol," which is full of NA values.
(2) If both values in the first two columns (i.e., Ozone and Solar.R) of the df dataset in each row are less than the average of the respective columns, put Low in the new column, if they are the same as the averages, put "Average," and if both values are greater than averages, put high in the new column (use the pipe operator).
*Hint*: You will need to replace the missing values on the Ozone column with the mean of the column before creating the new variable.
# Question 5: Rename the column "newCol" to "Air_Rate". Find a pair of variables with the highest and lowest correlation in df, and assign it to the variables highest_cor and lowest_cor.
Part 2: Gapminder Data
Questions 6 through 10 will be about the same dataset available from the "gapminder" package that you may have to install and load up to be able to use.
# Question 6: From the "gapminder" dataset, select the columns, "country", "continent", "year", and "lifeExp" and save the subset of gapminder data as "data." Tidy this data set using the pipe operator such that there is only one country in each row and many years in the columns and life expectancy as a value for year columns. Save this new tidy data as "wide_data" (should contain 13 columns in the end)
Hint: Since the data is not built-in in the R or RStduio, you would need to install the package called "gapminder" and then load it up to the computer's short-term memory.
# Question 7: Choose only the cases for the U.S.(Hint: 12 rows). Next, pipe the data into plotting a line chart with the variable "year" and "lifeExp" on the x-axis and y-axis, respectively. Improve the legibility of the chart. First, label the variables on the chart as "Years" for the x-axis and "Life Expectancy" for the y-axis. Next, provide the chart with the title "Life Expectancy over Years in the United States" and set the line color to red.
# Question 8:
(1) From the data set "gapminder," use the data for the most recent year only. Next, create a chart that shows the life expectancies by continent. Make the plot as beautiful and professional as you can. This includes adding color(s) to the bars and giving appropriate labels for the title, x-axis, and y-axis. What can you tell about the pattern of life expectancies across the continents? Hint: Due to a large number of countries in the data, using countries as x- or y-axis will be problematic in making the charts interpretable.
(2) This time, use the data with the most recent year and the "lifeExp" greater than 80. That is, create a plot that shows life expectancy for the countries whose life expectancy is greater than 80 for the most recent year available in the dataset. On the chart, show the countries in descending order of life expectancy. Also, apply color by continent such that countries in the same continent have the same color in their bars.
# Question 9: Again with the "gapminder" data frame, use a for loop to select the countries whose life expectancies are greater than 80 in the year 2007 and print the names of all the countries if they are in Asia. *Hints*: There should be 3 countries: Hong Kong, Israel, and Japan. Note: The country variable may include territories not recognized as countries by the UN.
# Question 10: With "gapminder" data frame, find one country (or territory) per conti

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Analysis Using SQL And Excel

Authors: Gordon S Linoff

2nd Edition

111902143X, 9781119021438

More Books

Students also viewed these Databases questions

Question

What is the source of production inputs for Gat Creek?

Answered: 1 week ago

Question

What was the positive value of Max Weber's model of "bureaucracy?"

Answered: 1 week ago

Question

b. Did you suppress any of your anger? Explain.

Answered: 1 week ago