Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The goal of this assignment is to understand the logic and methods of exploratory data analysis (EDA). The mode of analysis concerned with discovery, exploration,

The goal of this assignment is to understand the logic and methods of exploratory data analysis (EDA). The mode of analysis concerned with discovery, exploration, and empirically detecting phenomena in data. EDA has become the default pre-modeling step for every Machine Learning project engagement. Exploratory Data Analysis (EDA) is a way to investigate datasets and find preliminary information, insights, or uncover underlying patterns in the data. Instead of making assumptions, data can be processed in a systematic method to gain insights and make informed decisions. Investigate the data by utilizing NumPy, Pandas, any graph library (MatPlotlib, Seaborn, Plotly), and Pythons Statsmodel modules. The analysis of the data should be focus on predicting the progression of a disease (diabetes in our case). Get the data from Stanford Us Machine Learning Repository: https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data Here is a sample of the dataset (out of 442 records): For some background information on the data, see this seminal paper: Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499. https://projecteuclid.org/euclid.aos/1083178935 Load the dataset by using NumPys genfromtxt function (you are allowed to use others...) https://numpy.org/devdocs/user/basics.io.genfromtxt.html NOTE: You do NOT build/select a model, you only perform deep-dive analysis on the data. Write Python scripts in order to complete the following tasks along with their output. All work should be done and submitted in a single Jupyter Notebook. 1- Prep the data in order to be ready to be fed to a model. Look for missing, null, NaN records. Find outliers. Transform data all entries should be numeric. 2- List all types of data, numeric, categorical,... 3- Perform EDA on data. Present dependencies and correlations among the various features in the data. List the most variables (Feature Importance) that will affect the target label. 4- State limitations/issues (if any) with the given dataset.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Practical Issues In Database Management A Refernce For The Thinking Practitioner

Authors: Fabian Pascal

1st Edition

0201485559, 978-0201485554

More Books

Students also viewed these Databases questions

Question

Which are non projected Teaching aids in advance learning system?

Answered: 1 week ago