Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

data set is large name of dataset is delaney-processed.csv please provide code for all parts thank you!! dataset cannot be attached because too large, i

image text in transcribed
image text in transcribed
image text in transcribed
data set is large
name of dataset is delaney-processed.csv
please provide code for all parts thank you!!
dataset cannot be attached because too large, i can send link via another platform!
dataset name is included. If dataset dannot be reached maybe type code and pretend dataset is provided.
thank you
C. Perform a simple linear regression of the experimental value of the solubility (column labeled 'measured log solubility in mols per litre') on the predicted value from the paper (column labeled 'ESOL prdicted log solubility in mols per litre'). - Determine the regression coefficients and obtain an assessment of the fit using the Residual Standard Error (RSE) and the R2 statistic. - Do the values of RSE and R2 indicate that the model fits well the measured values? - Illustrate the fitted line in a graph along with the data. - Use statsmodels ordinary least squares (OLS) regression model to perform the linear regression. Print the statistics using the summary table (use the summary() function in statsmodels). Using these results explain how good the statistical prediction is. Determine the residuals, standardized (studentized) residuals, the leverages and plot the Residuals versus the fitted values and the Standardized Residuals versus the Leverages. What do these plots tell you? D. Generate a 5-predictor input dataset by selecting columns corresponding to 'Molecular Weight', 'Number of H-Bond Donors', 'Number of Rings', 'Number of Rotatable Bonds' and 'Polar Surface Area' and the output variable consisting of the experimental value of the solubility ('measured log solubility in mols per litre'). - Use statsmodels ordinary least squares (OLS) regression model to perform a multiple linear regression. Print the statistics using the summary table (use the summary() function in statsmodels). Using these results explain how good the statistical prediction is. - Split these dataset into a training set, comprising 80% of the data randomly selected, and a test set, comprising the remaining 20% of the original data. - Perform a multiple linear regression of the training set of solubility on the training set of 5 predictors and determine the regression coefficients. - Assess the fit by obtaining the Residual Standard Error (RSE) and the R2 statistic for the test set. How do these results compare with those in part C? - Perform a simple linear regression of the test output variable on the predicted test values and illustrate the fitted line in a graph along with the scatter plot of the test and predicted test output data. E. Generate a 4-predictor dataset by removing one of the columns included in the above set in part D. - Use statsmodels ordinary least squares (OLS) regression model to perform a multiple linear regression. Print the statistics using the summary table (use the summary() function in statsmodels). Using these results explain how good the statistical prediction is. - Perform a multiple linear regression on the 4-predictor using the approach in part D. - Perform a simple linear regression of the test output variable on the predicted test values and illustrate the fitted line in a graph along with the the scatter plot of the test and predicted test output data. - Discuss the outcome of the multiple linear regression on the 4-predictor set in comparison with the 5-predictor set. - On the basis of this comparison discuss the importance of including in the regression the predictor removed for the calculations in part E. delaney-processed.csv Back Ct predicted log velubility in mas per iton Minimum Degree Molecular Weight

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Information And Database Systems Asian Conference Aciids 2012 Kaohsiung Taiwan March 19 21 2012 Proceedings Part 3 Lnai 7198

Authors: Jeng-Shyang Pan ,Shyi-Ming Chen ,Ngoc-Thanh Nguyen

2012th Edition

3642284922, 978-3642284922

Students also viewed these Databases questions

Question

Is what I want clear?

Answered: 1 week ago

Question

2 What supply is and what affects it.

Answered: 1 week ago

Question

3 How supply and demand together determine market equilibrium.

Answered: 1 week ago