Answered step by step
Verified Expert Solution
Question
1 Approved Answer
STAT 3 2 8 0 - Homework 1 1 starwars The dplyr package includes a dataset called starwars . Using this dataset, answer the following
STAT Homework
starwars
The dplyr package includes a dataset called starwars Using this dataset,
answer the following questions using piped dplyr code:
Which column has the most missing values?
How many individuals in the dataset are neither male nor female?
with a single line of code, find out how many individuals come from the
most frequent species? This one is more difficult
filter the dataset to only include humans, then sort by homeworld de
scending and then by height. What are the last four rows of the resulting
dataset?
Mammal Sleep
the ggplot package includes the msleep dataset. Answer the following using
dplyr code, preferably with a single piped command.
Look at the column names, then modify two of these names to something
that you find is more informative. Print out the first row of the dataset.
How many rows have at least one missing value? First, get rid of the
last two columns, then remove all rows that still have at least one missing
value.
Orders
For this exercise, use the orders and clients datasets found on Canvas
perform a left join of clients with orders based on the num client variable.
Look at the resulting dataset and explain what the join did. Report the
size of the joined dataset in rows and columns.
now, perform an inner join instead. What is the size of this joined dataset?
If the size is different, why are these sizes different?
Now, perform a semi join on these two datasets. What is the result and
why is it different?STAT Homework
starwars
The dplyr package includes a dataset called starwars Using this dataset,
answer the following questions using piped dplyr code:
Which column has the most missing values?
How many individuals in the dataset are neither male nor female?
with a single line of code, find out how many individuals come from the
most frequent species? This one is more difficult
filter the dataset to only include humans, then sort by homeworld de
scending and then by height. What are the last four rows of the resulting
dataset?
Mammal Sleep
the ggplot package includes the msleep dataset. Answer the following using
dplyr code, preferably with a single piped command.
Look at the column names, then modify two of these names to something
that you find is more informative. Print out the first row of the dataset.
How many rows have at least one missing value? First, get rid of the
last two columns, then remove all rows that still have at least one missing
value.
Orders
For this exercise, use the orders and clients datasets found on Canvas
perform a left join of clients with orders based on the num client variable.
Look at the resulting dataset and explain what the join did. Report the
size of the joined dataset in rows and columns.
now, perform an inner join instead. What is the size of this joined dataset?
If the size is different, why are these sizes different?
Now, perform a semi join on these two datasets. What is the result and
why is it different?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started