Question
Please submit a link to the dataset you plan to use, and at least 2 high-level analyses (Section 3 of the project writeup). For each
Please submit a link to the dataset you plan to use, and at least 2 high-level analyses (Section 3 of the project writeup). For each analysis, please submit at minimum:
The question you plan to answer
A visualization you plan to create, if applicable
A statistical test you plan to use, if applicable
You do not have to have performed the analyses yet.
I will leave feedback on your suggested analyses and if any other analyses might be appropriate for your dataset. Dataset
Identify a dataset of suitable complexity for an in-depth analysis. Your dataset should have
approximately the following characteristics, with some room for variation depending on interest,
available data, and your plans for the project:
At least six major variables, including:
3 or more continuous variables (price, population, age, dimensions, rating, etc.)
3 or more categorical variables (species, product type, political party, home state, etc.)
Ideally, you should have some domain knowledge about the dataset. If not, you can familiarize
yourself with the domain where necessary to explain any observations or insights.
Dataset sources:
Kaggle
2. Exploratory Analysis Conduct an exploratory analysis of the data. The analysis should include: If your dataset has missing values, identify and explain them. If your analysis requires you to handle the missing values, describe your strategy for doing so. Numeric variables: Mean, min, max, median, quartiles Correlations between all variables Visualize outliers Visualize data distribution Categorical variables: Value counts with bar charts 3. High-level analysis Perform at least 6 higher-level analyses of your data. You are free to use any techniques we discussed in class, including but not limited to: Use Pandas features to answer specific questions about the data Perform a cluster analysis to identify groups within your data Identify and motivate a machine learning problem in your data (classification or regression). Create a train/test/validation split and evaluate how well an appropriate model performs Perform a linear regression to show the relationship between two variables If applicable to an analysis, you must include: Appropriate statistical test(s) An appropriate visualization. Please take advantage of the check-ins or office hours if you are unsure whether a visualization or statistical test is necessary for an analysis. 4. Final Report Compile your results into a written report submitted separately as a PDF, Word document, or other appropriate text format. Do not include code in the report unless absolutely necessary. Your report should use the following structure: 1. Introduction: Describe your dataset. What is its purpose and what kind of data does it contain? What do you hope to discover in your analysis? 2. Exploratory analysis. Describe the characteristics of the data you observe, with visualization to support your observations. Use domain knowledge to explain interesting observations, citing external sources if necessary. 3. High-level analysis. Introduce each of your analyses and present them, with relevant visualizations, in their own sections. 4. Conclusions. What did you learn from this project? End with a thoughtful discussion of the data and insights you obtained from your analysis, and draw conclusions.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started