Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

The goal of this assignment is to understand the logic and methods of exploratory data analysis (EDA). The mode of analysis concerned with discovery, exploration,

The goal of this assignment is to understand the logic and methods of exploratory data analysis (EDA). The mode of analysis concerned with discovery, exploration, and empirically detecting phenomena in data. EDA has become the default pre-modeling step for every Machine Learning project engagement. Exploratory Data Analysis (EDA) is a way to investigate datasets and find preliminary information, insights, or uncover underlying patterns in the data. Instead of making assumptions, data can be processed in a systematic method to gain insights and make informed decisions. Investigate the data by utilizing NumPy, Pandas, any graph library (MatPlotlib, Seaborn, Plotly), and Pythons Statsmodel modules. The analysis of the data should be focus on predicting the progression of a disease (diabetes in our case). Get the data from Stanford Us Machine Learning Repository: https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data Here is a sample of the dataset (out of 442 records): For some background information on the data, see this seminal paper: Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499. https://projecteuclid.org/euclid.aos/1083178935 Load the dataset by using NumPys genfromtxt function (you are allowed to use others...) https://numpy.org/devdocs/user/basics.io.genfromtxt.html NOTE: You do NOT build/select a model, you only perform deep-dive analysis on the data. Write Python scripts in order to complete the following tasks along with their output. All work should be done and submitted in a single Jupyter Notebook. 1- Prep the data in order to be ready to be fed to a model. Look for missing, null, NaN records. Find outliers. Transform data all entries should be numeric. 2- List all types of data, numeric, categorical,... 3- Perform EDA on data. Present dependencies and correlations among the various features in the data. List the most variables (Feature Importance) that will affect the target label. 4- State limitations/issues (if any) with the given dataset.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Practical Issues In Database Management A Refernce For The Thinking Practitioner

Practical Issues In Database Management A Refernce For The Thinking Practitioner

Authors: Fabian Pascal

1st Edition

0201485559, 978-0201485554

More Books

Students also viewed these Databases questions

Question

This assignment requires working in teams of five or six. Half of each team is to assume the role of manage1nent at a firm that is about to undergo major downsizing. The other half of each team is to...

Answered: 1 week ago

Question

★★★★★

Martinez Company has decided to introduce a new product. The new product can be manufactured by either a capital-intensive method or a labor-intensive method. The manufacturing method will not affect...

Answered: 1 week ago

Question

★★★★★

The goal of this assignment is to understand the logic and methods of exploratory data analysis (EDA). The mode of analysis concerned with discovery, exploration, and empirically detecting phenomena...

Answered: 1 week ago

Question

★★★★★

How do the percentages of women in VP positions at Nike compare to the percentage of women in Nike's workforce

Answered: 1 week ago

Question

★★★★★

Lanco Corporation, an accrual-method corporation, reported taxable income of $1,460,000 this year. Included in the computation of taxable income were the following items: - MACRS depreciation of...

Answered: 1 week ago

Question

★★★★★

Compute the interest rate if future value (FV) = $7,198, present value (FV) = $2,694, and number of years (t) = 9. (Do not round intermediate calculations and round your answers to 2 decimal places,...

Answered: 1 week ago

Question

★★★★★

The U.S. dollar has been steadily strengthening with respect to the Hong Kong dollar. A U.S. parent has a Hong Kong subsidiary. The subsidiary has positive net assets and its monetary assets are less...

Answered: 1 week ago

Question

★★★★★

What does RRAM stand for?Where is it used?What are its applications?

Answered: 1 week ago

Question

★★★★★

Which are non projected Teaching aids in advance learning system?

Answered: 1 week ago

Question

★★★★★

6. Many employees are unwilling to relocate geographically because they like their current community and because spouses and children prefer not to move. As a result, it is difficult to develop...

Answered: 1 week ago

Question

★★★★★

5. Your boss is interested in hiring a consultant to help identify potential managers from current employees of a fast-food restaurant. The managers job is to help wait on customers and prepare food...

Answered: 1 week ago

Question

★★★★★

10. How do volunteer assignments and job experiences contribute to employee development?

Answered: 1 week ago

Previous Question Next Question