Question
Choose a data set of interest to you that has at least 6,000 rows of data. complete an exploratory data analysis. This means: getting a
Choose a data set of interest to you that has at least 6,000 rows of data. complete an exploratory data analysis. This means: getting a sense of the data
- creating visualizations to represent the data; include histograms, boxplots, scatterplots etc. as needed
- obtain descriptive statistic of the data
- finding subsets of data and getting descriptive statistics for each subset
- create visualizations for subset data
As you explore your data and see it visualized, you may find that your dataset has extreme outliers, incomplete data or just wrong data. If this is the case, you will need to clean your data to get a better understanding of the data. Once you have cleaned the data, perform the exploratory analysis again.
In your report be sure to:
- Describe your dataset.
- What is the purpose of the dataset? What is your data source?
- What kind of data is included? Is it all text data, is it numerical?
- Describe the data fields including the title, the data type, the data description, etc.
- How many rows of data are there? how many fields?
- How many rows of data are there? how many fields?
- Describe any data cleaning you did
- Provide visualizations of the key data and subset data of interest. This should be done for categorical data, discrete data and continuous data.
- Provide descriptive statistical tables for key data fields of interest.
- Provide analysis above and beyond the graphs and tables. Explain what the tables and visualizations tell you about the data.
Format
Present your Exploratory Data Analysis in a report. Incorporate visualizations and tables into the textual analysis of your report. If appropriate, add an appendix of additional data tables and graphs.
The report should follow this flow:
Introduction: introduce the data, its purpose, the sources, the reason for choosing the data and what you hope to learn from the data. Incorporate a discussion of the data cleaning methods used.
Data Analysis: This is the body of the report where you provide descriptions of the data, basis statistical measures, graphs, tables and analysis.
Summary: Summarize the report. Identify the key take-aways from your analysis. Describe what you want to explore further about your data. Identify questions you want to answer with the data.
What to Submit
You must submit 3 files:
- The Exploratory Data Analysis report
- Your dataset (if chosen and not provided to you)
- The R code used to analyze the data
These websites are examples of many available sources to find out good dataset for your course Final Project.
Links:
https://middlebury.libguides.com/econstats/large-datasets
https://data.gov/
https://www.cihi.ca/en/access-data-and-reports/data-tables
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started