Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write Python code for any given csv dataset with some 2 0 - 2 5 columns and around 2 0 0 0 rows, Include explaination

Write Python code for any given csv dataset with some 20-25 columns and around 2000 rows, Include explaination and suggest alternatives - make it simple to understand and write.
#### 1. Data Understanding (5 marks)
a. Read the dataset (tab, csv, xls, txt, inbuilt dataset). What are the number of rows and no. of cols & types of variables (continuous, categorical etc.)?(1 MARK)
b. Calculate five-point summary for numerical variables (1 MARK)
c. Summarize observations for categorical variables no. of categories, % observations in each category. (1 marks)
d. Check for defects in the data such as missing values, null, outliers, etc. (2 marks)
#### 2. Data Preparation (15 marks)
a. Fix the defects found above and do appropriate treatment if any. (5 marks)
b. Visualize the data using relevant plots. Find out the variables which are highly correlated with target variable? (5 marks)
c. Do you want to exclude some variables from the model based on this analysis? What other actions will you take? (2 marks)
d. Split dataset into train and test (70:30). Are both train and test representative of the overall data? How would you ascertain this statistically? (3 marks)
### 3. Model Building (20 marks)
a. Fit a base model and observe the overall R- Squared, RMSE and MAPE values of the model. Please comment on whether it is good or not. (5 marks)
b. Check for multi-collinearity and treat the same. (3 marks)
c. How would you improve the model? Write clearly the changes that you will make before re-fitting the model. Fit the final model. (6 marks)
d. Write down a business interpretation/explanation of the model which variables are affecting the target the most and explain the relationship. Feel free to use charts or graphs to explain. (4 marks)
e. What changes from the base model had the most effect on model performance? (2 marks)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Knowledge Discovery In Databases

Authors: Gregory Piatetsky-Shapiro, William Frawley

1st Edition

0262660709, 978-0262660709

More Books

Students also viewed these Databases questions

Question

1. Understand how verbal and nonverbal communication differ.

Answered: 1 week ago