Question
Create a Python file called linearRegression.py . In this task you will use the diabetes dataset mentioned above to perform linear regression to find the
Create a Python file called linearRegression.py . In this task you will use the diabetes dataset mentioned above to perform linear regression to find the best fit line through the data. Reserve the last 20 observations for testing and use the rest for training your model. Instead of using linear_model.LinearRegression() from sklearn , write a function and make use of numpy to calculate the gradient and the y-intercept of the best fit line, which has equation y = mx + b . The equations below describe how both the gradient and the y-intercept can be calculated from the training data and labels. Note: when you calculate the gradient, you will need to reshape the x array to remove an extra dimension of 1 from its shape (it has this as the dataset was formatted for use with the sklearn functions, which require this extra dimension). You can easily do this by applying .squeeze() to the x array when you pass it as an argument to the method. Hint: if the line doesnt look like it fits the data well, there is a bug in your code. m = ((x) * (y) (x * y))/(((x))2 (x2)) b = (y) m * (x) Where is a mean function Use these values to produce a figure with the following: Scatter plot of training data colored red. Scatter plot of testing data colored green. Line graph for the best-fit line colored blue Legend.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started