Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 29, 2024

. Implement the following tasks in the R Markdown file. Do not forget to copy and paste your questions and create code chunks in your

.

Implement the following tasks in the R Markdown file.

Do not forget to copy and paste your questions and create code chunks in your R Markdown file for each question. For example, for this assignment, you must have

10

code chunks

(

You have

10

questions

) .

Insert your code chunk below each question.

Part

1

: Airqualty Data

Questions

1

through

5

will be about the same dataset available in the R program: "airquality."

# Question

1

1)

Get a local copy of the dataset "airquality" and name it

"

"

so that you can use it later.

(2)

Next, show the first

7

rows of it

.

Pay attention to the names of the variables.

(3)

Write a code that reveals how many variables and observations are in the dataset.

(4)

Also, write a code that gives you some basic descriptive statistics. You will notice that two variables have missing values.

# Question

2

: Write the codes that tell you

(1)

where the missing values are located,

(2)

the number of missing values in the dataset

(

), (3)

the number of missing values in the Solar.R column, and

(4)

all the rows that include at least one missing value.

(5)

Lastly, write the code that returns the number of rows that include at least one missing value. Hint: there are rows that have more than one missing value.

# Question

3

(1)

Replace all the missing values in the Solar.R column with the median of the values in the column.

(2)

Also, get the standard deviation and average of all columns.

# Question

4

: The goal is to create a new column filled with "low", "average", and "high" based on information from Ozone and Solar.R columns.

(1)

Create a new column called "newCol," which is full of NA values.

(2)

If both values in the first two columns

(

.

.,

Ozone and Solar.R

)

of the df dataset in each row are less than the average of the respective columns, put

Low

in the new column, if they are the same as the averages, put "Average," and if both values are greater than averages, put

high

in the new column

(

use the pipe operator

) .

*

Hint

*

: You will need to replace the missing values on the Ozone column with the mean of the column before creating the new variable.

# Question

5

: Rename the column "newCol" to "Air

_

Rate". Find a pair of variables with the highest and lowest correlation in df

,

and assign it to the variables highest

_

cor and lowest

_

cor.

Part

2

: Gapminder Data

Questions

6

through

10

will be about the same dataset available from the "gapminder" package that you may have to install and load up to be able to use.

# Question

6

: From the "gapminder" dataset, select the columns, "country", "continent", "year", and "lifeExp" and save the subset of gapminder data as "data." Tidy this data set using the pipe operator such that there is only one country in each row and many years in the columns and life expectancy as a value for year columns. Save this new tidy data as "wide

_

data"

(

should contain

13

columns in the end

)

Hint: Since the data is not built

-

in in the R or RStduio, you would need to install the package called "gapminder" and then load it up to the computer's short

-

term memory.

# Question

7

: Choose only the cases for the U

.

. (

Hint:

12

rows

) .

Next, pipe the data into plotting a line chart with the variable "year" and "lifeExp" on the x

-

axis and y

-

axis, respectively. Improve the legibility of the chart. First, label the variables on the chart as "Years" for the x

-

axis and "Life Expectancy" for the y

-

axis. Next, provide the chart with the title "Life Expectancy over Years in the United States" and set the line color to red.

# Question

8

(1)

From the data set "gapminder," use the data for the most recent year only. Next, create a chart that shows the life expectancies by continent. Make the plot as beautiful and professional as you can. This includes adding color

(

)

to the bars and giving appropriate labels for the title, x

-

axis, and y

-

axis. What can you tell about the pattern of life expectancies across the continents? Hint: Due to a large number of countries in the data, using countries as x

-

or y

-

axis will be problematic in making the charts interpretable.

(2)

This time, use the data with the most recent year and the "lifeExp" greater than

80 .

That is

,

create a plot that shows life expectancy for the countries whose life expectancy is greater than

80

for the most recent year available in the dataset. On the chart, show the countries in descending order of life expectancy. Also, apply color by continent such that countries in the same continent have the same color in their bars.

# Question

9

: Again with the "gapminder" data frame, use a for loop to select the countries whose life expectancies are greater than

80

in the year

2007

and print the names of all the countries if they are in Asia.

*

Hints

*

: There should be

3

countries: Hong Kong, Israel, and Japan. Note: The country variable may include territories not recognized as countries by the UN

.

# Question

10

: With "gapminder" data frame, find one country

(

or territory

)

per conti

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Analysis Using SQL And Excel

Authors: Gordon S Linoff

2nd Edition

★★★★★

d. If you could relive the situation, would you do anything differently? Explain.

Answered: 1 week ago

Previous Question Next Question