Question
COVID-19 EDA: Perform an Experimental Data Analysis using R. Data source R code: data % mutate(dateRep = dmy(dateRep), countriesAndTerritories = as.factor(countriesAndTerritories), geoId = as.factor(geoId), countryterritoryCode
COVID-19 EDA: Perform an Experimental Data Analysis using R.
Data source R code:
data <- read.csv("https://opendata.ecdc.europa.eu/covid19/nationalcasedeath_eueea_daily_ei/csv", na.strings = "", fileEncoding = "UTF-8-BOM")
data <- data %>% select(-c("continentExp")) %>% mutate(dateRep = dmy(dateRep), countriesAndTerritories = as.factor(countriesAndTerritories), geoId = as.factor(geoId), countryterritoryCode = as.factor(countryterritoryCode))
A data dictionary for the dataset is available here: https://www.ecdc.europa.eu/sites/default/files/documents/Description-and-disclaimer_daily_reporting.pdf
Definitions:
* "Incidence rate" is equal to new daily cases per 100K individuals. Country population estimates can be found in 'popData2020.'
* "Fatality rate" is equal to new daily deaths per 100K individuals. Country population estimates can be found in 'popData2020.'
1. Descriptive Statistics Give example R code for each of the following:
* Creation of a vector, 'incidence_rate,' equal to the daily new cases per 100K individuals, per country. Country populations are provided in 'popData2020.' This vector should be added to the 'data' data frame. * Creation of a vector, 'fatality_rate,' equal to the new deaths per 100K individuals, per country. Country populations are provided in 'popData2020.' This vector should be added to the 'data' data frame. * A visualization exploring new cases or incidence rates, per country, over time. Your visualization should include at least five (5) countries and include the entire time frame of the dataset. * A visualization exploring new deaths or fatality rates, per country, over time. Again, your visualization should include at least five (5) countries. * A table or visualization exploring some other aspect of the data. For example, you could explore case fatality rates per country; the number of deaths divided by the total number of cases. You will want to look across the entire time of the dataset, looking at the total cases and deaths, per country.
2. Inferential Statistics Select two (2) countries of your choosing and compare their incidence or fatality rates using hypothesis testing.
Please give example R code for each of the following:
* Visualization(s) comparing the daily incidence or fatality rates of the selected countries, * A statement of the null hypothesis. * A short justification of the statistical test selected. + Why is the test you selected an appropriate one for the comparison we're making? * A brief discussion of any distributional assumptions of that test. + Does the statistical test we selected require assumptions about our data? + If so, does our data satisfy those assumptions? * Your selected alpha. * The test function output; i.e. the R output. * The relevant confidence interval, if not returned by the R test output. * A concluding statement on the outcome of the statistical test. + i.e. Based on our selected alpha, do we reject or fail to reject our null hypothesis?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access with AI-Powered Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started