Question
Choose a dataset from Kaggle.com or any other repository. The dataset size must be not less than 20 Mbytes (uncompressed). Note: The dataset must have
Choose a dataset from Kaggle.com or any other repository. The dataset size must be not less than 20 Mbytes (uncompressed).
Note: The dataset must have missing values, noise and outliers
in python
1- Create a heat map of the correlation matrix that shows correlation coefficients among all the variables in the dataset. What are your observations?
2- Deduct some statistical results from the datasets (at least two results and discuss it in detail)
3- Perform the normality test for the data and graphically represent the results. Transform the data if not normally distributed.
4- Develop any two classification/clustering/Regression models based on your dataset type. Briefly describe the interpretation of each model.
5- Select one of the developed models and perform hyper-parameter tuning using best combination of model parameters.
6-Compare the optimized model with the initial model and indicate whether the results are statistically significant?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started