Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Write Python code for any given csv dataset with some 2 0 - 2 5 columns and around 2 0 0 0 rows, Include explaination
Write Python code for any given csv dataset with some columns and around rows, Include explaination and suggest alternatives make it simple to understand and write. #### Data Understanding marks a Read the dataset tab csv xls txt inbuilt dataset What are the number of rows and no of cols & types of variables continuous categorical etc. MARK b Calculate fivepoint summary for numerical variables MARK c Summarize observations for categorical variables no of categories, observations in each category. marks d Check for defects in the data such as missing values, null, outliers, etc. marks #### Data Preparation marks a Fix the defects found above and do appropriate treatment if any. marks b Visualize the data using relevant plots. Find out the variables which are highly correlated with target variable? marks c Do you want to exclude some variables from the model based on this analysis? What other actions will you take? marks d Split dataset into train and test : Are both train and test representative of the overall data? How would you ascertain this statistically? marks ### Model Building marks a Fit a base model and observe the overall R Squared, RMSE and MAPE values of the model. Please comment on whether it is good or not. marks b Check for multicollinearity and treat the same. marks c How would you improve the model? Write clearly the changes that you will make before refitting the model. Fit the final model. marks d Write down a business interpretationexplanation of the model which variables are affecting the target the most and explain the relationship. Feel free to use charts or graphs to explain. marks e What changes from the base model had the most effect on model performance? marks
Write Python code for any given csv dataset with some columns and around rows, Include explaination and suggest alternatives make it simple to understand and write.
#### Data Understanding marks
a Read the dataset tab csv xls txt inbuilt dataset What are the number of rows and no of cols & types of variables continuous categorical etc. MARK
b Calculate fivepoint summary for numerical variables MARK
c Summarize observations for categorical variables no of categories, observations in each category. marks
d Check for defects in the data such as missing values, null, outliers, etc. marks
#### Data Preparation marks
a Fix the defects found above and do appropriate treatment if any. marks
b Visualize the data using relevant plots. Find out the variables which are highly correlated with target variable? marks
c Do you want to exclude some variables from the model based on this analysis? What other actions will you take? marks
d Split dataset into train and test : Are both train and test representative of the overall data? How would you ascertain this statistically? marks
### Model Building marks
a Fit a base model and observe the overall R Squared, RMSE and MAPE values of the model. Please comment on whether it is good or not. marks
b Check for multicollinearity and treat the same. marks
c How would you improve the model? Write clearly the changes that you will make before refitting the model. Fit the final model. marks
d Write down a business interpretationexplanation of the model which variables are affecting the target the most and explain the relationship. Feel free to use charts or graphs to explain. marks
e What changes from the base model had the most effect on model performance? marks
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access with AI-Powered Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started