Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

STAT 3 2 8 0 - Homework 1 1 starwars The dplyr package includes a dataset called starwars . Using this dataset, answer the following

STAT 3280- Homework 1
1 starwars
The dplyr package includes a dataset called starwars. Using this dataset,
answer the following questions using piped dplyr code:
1. Which column has the most missing values?
2. How many individuals in the dataset are neither male nor female?
3. with a single line of code, find out how many individuals come from the
most frequent species? (This one is more difficult)
4. filter the dataset to only include humans, then sort by homeworld (de-
scending) and then by height. What are the last four rows of the resulting
dataset?
2 Mammal Sleep
the ggplot2 package includes the msleep dataset. Answer the following using
dplyr code, preferably with a single piped command.
1. Look at the column names, then modify two of these names to something
that you find is more informative. Print out the first row of the dataset.
2. How many rows have at least one missing value? First, get rid of the
last two columns, then remove all rows that still have at least one missing
value.
1
3 Orders
For this exercise, use the orders and clients datasets found on Canvas
1. perform a left join of clients with orders based on the num client variable.
Look at the resulting dataset and explain what the join did. Report the
size of the joined dataset in rows and columns.
2. now, perform an inner join instead. What is the size of this joined dataset?
If the size is different, why are these sizes different?
3. Now, perform a semi join on these two datasets. What is the result and
why is it different?STAT 3280- Homework 1
1 starwars
The dplyr package includes a dataset called starwars. Using this dataset,
answer the following questions using piped dplyr code:
1. Which column has the most missing values?
2. How many individuals in the dataset are neither male nor female?
3. with a single line of code, find out how many individuals come from the
most frequent species? (This one is more difficult)
4. filter the dataset to only include humans, then sort by homeworld (de-
scending) and then by height. What are the last four rows of the resulting
dataset?
2 Mammal Sleep
the ggplot2 package includes the msleep dataset. Answer the following using
dplyr code, preferably with a single piped command.
1. Look at the column names, then modify two of these names to something
that you find is more informative. Print out the first row of the dataset.
2. How many rows have at least one missing value? First, get rid of the
last two columns, then remove all rows that still have at least one missing
value.
1
3 Orders
For this exercise, use the orders and clients datasets found on Canvas
1. perform a left join of clients with orders based on the num client variable.
Look at the resulting dataset and explain what the join did. Report the
size of the joined dataset in rows and columns.
2. now, perform an inner join instead. What is the size of this joined dataset?
If the size is different, why are these sizes different?
3. Now, perform a semi join on these two datasets. What is the result and
why is it different?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Security XI Status And Prospects

Authors: T.Y. Lin, Shelly Qian

1st Edition

0412820900, 978-0412820908

More Books

Students also viewed these Databases questions

Question

Microbiology unknown gram negative project

Answered: 1 week ago