Question
Complete each of the problems below. (1 Point) Use thempgdata set andggplotand to make a single figure that compares histograms of the city mileages of
Complete each of the problems below.
- (1 Point) Use thempgdata set andggplotand to make a single figure that compares histograms of the city mileages of Ford and Toyota vehicles.
- (2 Points) Use thempgdata set andggplotto make a single figure with boxplots of city mileage for all cylinder types.
- (1 Point) Create random a vector of data with 1000 observations associated to log-normal distribution, use this data to make a data frame, and then make a normality plot for the resulting data.
- (1 Point) Using the random data from the previous problem, make a quantile plot comparing the data to the log-normal distribution.
Projects (10 Points)
Choose one of the projects below and answer all associated questions.
Project 1
In this project we will be using the datasets available in theLahmanpackage. You should already have this package installed on your version of R, if not you will need to install it. Be sure to load this package before attempting these problems
- (1 Point) How many players on the Seattle Mariners have been at bat, but did not get a hit?
- (1 Point) Make a histogram of the hits for players that have been at bat as least once in 2019.
- (2 Points) Make a scatterplot for the number of hits for Hank Aaron for each season of his career.
- (2 Points) The top 5 players with themost career home runs (as of 2019) are Barry Bonds, Hank Aaron, Babe Ruth, Alex Rodriguez, and Willie Mays. Make a single scatter plot of the number of home runs in each season for all of these players.
- (2 Points) Albert Pujois passed Willie Mays in career home runs during the 2020 season. What is the chance that Pujois will get a hit in a game if he is at bat at 3 times? [Hint: use the binomial distribution]
- (2 Points) Use the data to construct a plot of your choosing.
Project 2
New York city has numerous data sets available viahttps://opendata.cityofnewyork.us/data/(Links to an external site.)
. We will be studying vehicle collisions
- (1 Point) Use the linkhttps://data.cityofnewyork.us/api/views/h9gi-nx95/rows.csv?accessType=DOWNLOAD(Links to an external site.)
- to upload a csv of the data directly into your R workspace.
- (1 Point) Clean the data by creating two new data frames; one with valid BOROUGH and one with NA in BOROUGH column.
- (2 Points) Construct a bar chart to compare the number of collisions for each BOROUGH.
- (2 Points) Make a histogram of the number of accidents for every hour in NYC.
- (2 Points) Astoria is the a neighborhood in Queens associated with the zip codes 11101, 11102, 11103, 11105, and 11106. What is the probability that there will be at least 4 accidents in Astoria in a day? [Hint: Use Poisson distribution]
- (2 Points) Use the data to construct a plot of your choosing.
Project 3
The CDC has multiple data sets available to study the spread and impacts of Covid-19. The COVID-19 case surveillance system database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and states. You can find more information about this here,https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf(Links to an external site.)
- (1 Point) Upload the CDC COVID-19 Case Surveillance Public Use Data into your R workspace.
- (1 Point) Clean the data set to only contain valid entries in the age group column
- (2 Points) Make a single bar chart that compares the deaths and recovered for each age group.
- (2 Points) Make a histogram for the number of cases reported to the cdc per day.
- (2 Points) On of January 26, 2021 there were151,616 new cases of Covid-19. Estimate the number of these new cases that will lead to death.
- (2 Points) Use the data to construct a plot of your choosing.
Project 4
A yield curve is a line that plots yields (interest rates) of bonds having equal credit quality but differing maturity dates. The slope of the yield curve gives an idea of future interest rate changes and economic activity. Treasury securities are often viewed as the safest securities available and thus offer a baseline for yield rates of other bonds. You can find information about treasury yield curve rates here,https://www.treasury.gov/resource-center/data-chart-center/interest-rates/pages/TextView.aspx?data=yield(Links to an external site.)
- (1 Point) Upload the csv,daily_treasury_yield_curve_rates.csvinto your R workspace.
- (1 Point) Make a scatterplot of the 1 month rates throughout 2020. Can you explain the behavior? What do you think is happening?
- (2 Points) Make a single plot with boxplots comparing 10 , 20, and 30 year yield rates.
- (2 Points) Find the summary of the 1 month yield rates and then clear the dataset of any outliers. Make a qq plot of the 1 month yield rates with no outliers.
- (2 Points) Find the mean and standard deviation of the 1 month yield rates with no outliers. What yield rate is greater than 95% of the other? Compare this value to the yield rates for January 2021 found onhttps://www.treasury.gov/resource-center/data-chart-center/interest-rates/pages/TextView.aspx?data=yield(Links to an external site.)
- (2 Points) Use the data to construct a plot of your choosing.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started