Question
Data Analytics is a subject that can be best appreciated only when applied to a dataset you are familiar with. The aim of this project
Data Analytics is a subject that can be best appreciated only when applied to a dataset you are familiar with. The aim of this project is to achieve that. Do not view this project as a hurdle in the course, rather a bridge to connect the topics you learnt to your work or subject domain. There are five main modules in this course:
- Module 1 : Normal Distribution (Percentile, distribution of means, and chance of occurrence if we assume normal distribution)
- Module 2 : Confidence Interval Estimation (Including Sample Size determination)
- Module 3 : Inferences from data (Hypothesis testing, i.e., confirming or checking if a claim made about the data. In this module, we dealt with only one sample)
- Module 4 : More Inferences from data (Multiple samples)
- Module 5 : Regression analysis (Both simple and multiple, apart from basic ANOVA)
Objective
The purpose of the project is for you to apply what you learnt from at least 4 modules on your dataset and make some inferences or estimations. Remember, each Hawkes learning quiz had 10-15 questions. Here I am asking you to do only 4 tests or analysis. But the key is you bring the data and you come up with the question, and each question/set of analysis represents something you learnt from the Modules (1-5). There should be four different ones. That is the best way to understand the concepts you learnt in this course. If you wish, you can use two data sources (datasets) to achieve it. It is not necessary all of them have to be done using one dataset.
Data source
There are 3 options, you can choose one of them (there are no restrictions on that)
- Bring your own data from work (you can remove any private or confidential information, for example: if you are bringing any sales or cost data of an item/product or service the name can be masked)
- Use data from your previous work or company you have access to (again you can remove any private/confidential information)
- Use data from public domain In todays world, there is no dearth of structured data. Here are some places where you can get data from:
- Any data source you have access to like the Hawkes Learning Resources
- Datasets (1) (Links to an external site.) from Hawkes
- Datasets (2) (Links to an external site.) from Hawkes - Look at the additional datasets, not the chapter datasets
- U.S. Bureau of Labor Statics (Links to an external site.)
- U.S. Governments open data (Links to an external site.)
- Center for Medicare and Medicaid services (Links to an external site.)
- WHO Data repository (Links to an external site.)
- World Bank Data (Links to an external site.)
- Google Public data explorer (Links to an external site.)
- Amazing visualization or graphics
- But remember, we need the data to do analysis, if you look at the bottom of any figure Google would provide the source name, and you can retrieve data from there.
- Any sports data (from the appropriate website, getting data in structured format for several years might be challenge, but a few minutes or an hour you can do it)
- For example Cricket data could be obtained from espncricinfo (Links to an external site.).
- Any data source you have access to like the Hawkes Learning Resources
IMPORTANT: Make sure your data includes at least 30 observations for each population/group you're analyzing.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started