Question
For this problem, we will be performing simple linear regression using the following dataset: Fish.csvThis data file comes from kaggle.com:https://www.kaggle.com/aungpyaeap/fish-market As stated on the linked
For this problem, we will be performing simple linear regression using the following dataset:
Fish.csvThis data file comes from kaggle.com:https://www.kaggle.com/aungpyaeap/fish-market
As stated on the linked page: "This dataset is a record of 7 common different fish species in fish market sales. With this dataset, a predictive model can be performed using machine friendly data and estimate the weight of fish can be predicted."
Response:
- Weight (in grams)
Features:
- Length1 (vertical length in cm)
- Length2 (diagonal length in cm)
- Length3 (cross length in cm)
- Height (in cm)
- Width (diagonal width in cm)
The species name of the fish is also given.
Part A: Read the data from the csv of your choosing into a Pandas DataFrame. If you are reading inFish.csv, I would recommend dropping the species column as it is non-numerical.
Also, make sure to re-order the columns so that the response variable is the last column.
[ ]:
Part B:Make separate scatter plots for each feature versus the response. From these plots, we will try and make inferences about which features appear to have a relationship with the response variable. Write a brief summary of what you notice in each plot. Do you notice any trends in the data?
[ ]:
Part C:Use stats.linregress to fit simple linear regression models to the data. Fit a separate SLR model for each feature.
Further documentation:https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
Once you have fit each model, report the following information about each model:
- intercept value
- slope value
- p-value
[ ]:
Part D:Use the SLR model fromPart Cfor
Length3versus
Weightto estimate the weight of a fish whose measurement for=31
Length3=31cm.
[ ]:
Part E:Looking at all 5 SLR models fromPart C, what do you notice about the p-values? What inferences could you make from this information.
Part F:Now, let's fit a multiiple linear regression model! We will uses statsmodels for this task. Execute the following cell to import the required package. Use sm.OLS.fit to accomplish this. Then use model.params to print the regression coeficients to the screen.
Further documentation:https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html
Finally, explicitly write out the MLR model using the coefficients that you found so that you have an answer of the form:
=
0
+
1
1
+
2
2
+
3
3
+
4
4
+
5
5
y^=0+1x1+2x2+3x3+4x4+5x5
[ ]:
import statsmodels.api as sm
[ ]:
Part G: Based on your MLR Model inPart F, use the full model to predict the fish weight when the following features are observed:
- Length1: 26 cm
- Length2: 28 cm
- Length3: 31 cm
- Height: 9 cm
- Width: 4 cm
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started