Question
Linear Regression In this part, you will be required to work on Linear Regression for the Linnerud dataset from sklearn. The Linnerud dataset is a
Linear Regression
In this part, you will be required to work on Linear Regression for the Linnerud dataset from sklearn.
The Linnerud dataset is a multi-output regression dataset. It consists of three excercise (data) and three physiological (target) variables collected from twenty middle-aged men in a fitness club:
physiological - containing 20 observations on 3 physiological variables: Weight, Waist and Pulse.
exercise - containing 20 observations on 3 exercise variables: Chins, Situps and Jumps.
Part 1A: Load and examine the Linnerud dataset
Instruction 1.1. Write your code to load the Linnerud dataset from sklearn and assign it to a variable called linnerud.
[
1. Write your code to load the **Linnerud** dataset from sklearn
and assign it to a variable called `linnerud`.
X = linnerud['data']
Y = linnerud['target']
feature_names_X = linnerud['feature_names']
label_names_Y = linnerud['target_names']
Instruction 1.2. Now you need to examine the size and structure of the dataset.
Your tasks are:
- Write your code to find and print out the number of samples and the number of features in the dataset. (1 mark)
- Print the feature and label names for the dataset. (1 mark)
[Total mark: 2]
1. Write your code to find and print out the number of **samples**
and the number of **features** in the dataset.
Using variable X.
Instruction 1.3. We also need to get a brief understanding about the task by doing some statistics on the features and labels.
Your tasks are:
-
Write your code to print the min, max, median for each of the features. You need to use a loop in this task. (2 marks)
-
Construct a box-plot for each of the features. (3 marks)
-
Write your code to print the min, max, median for each of the labels. You need to use a loop in this task. (1 marks)
-
Construct a box-plot for each of the labels. (1 marks)
L
#4. Construct a box-plot for each of the features.
# INSERT YOUR CODE HERE
Part 1B. Linear Regression
You are required to apply Linear Regression to train and make predictions on the Linnerud dataset.
Note: To perform a supervised learning task, we need to train the model on a training set of the input data and the correct labels, and predict the trained model on unseen data. Then, we use the correct labels of the unseen data to evaluate the performance of the model. The unseen dataset is called the test set.
In this part, we will be using one-dimensional linear regression using Situps feature and Waist label.
Instruction 1.4. First you need to split the required feature and label from the Linnerud dataset into a training set and a test set. We will use 70% samples for training and 30% for testing. Print the number of samples in each set.
[Total marks: 5]
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
X_for_1D_LR = X[:,np.where(np.array([feature_names_X])[0] == 'Situps')[0]]
Y_for_1D_LR = Y[:,np.where(np.array([label_names_Y])[0] == 'Waist')[0]]
# first, compute the number of samples in the training set:
n_train = int(len(Y_for_1D_LR) * 0.7)
# The training set is the first n_train samples in the dataset
X_train = X_for_1D_LR[: n_train]
Y_train = # INSERT YOUR CODE HERE
# The test set is the remaining samples in the dataset
X_test = # INSERT YOUR CODE HERE
Y_test = # INSERT YOUR CODE HERE
# Print the number of samples in the training set
print('The number of samples in the training set:')
# INSERT YOUR CODE HERE
# Print the number of samples in the test set
print('The number of samples in the test set:')
# INSERT YOUR CODE HERE
Instruction 1.5. Your tasks are:
- Create a Linear Regression model called lr. (5 marks)
- Fit the training data to the model. (5 marks)
[Total marks: 10]
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
lr = # INSERT YOUR CODE HERE
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
Instruction 1.6 Predict the output of the test set.
[Total marks: 5]
In [ ]:
# INSERT YOUR CODE HERE
Instruction 1.7 Construct a plot, where you will show the regression line for Waist vs Situps, the training data (use blue colour), the testing data (use green colour), and the residuals for the testing data.
[Total marks: 5]
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# Construct a plot, where you will show the regression line for Waist vs Situps, the training data (use blue colour),
# the testing data (use green colour), and the residuals for the testing data.
# INSERT YOUR CODE HERE
Part 1C. Results
Note: To evaluate the performance of a Linear Regression model, two commonly used measures are mean absolute error and root mean squared error.
mean absolute error is defined by:
__(,)=1=1||mean_absolute_error(Ytest,Ypred)=1nsamplesi=1nsamples|ytestiypredi|
root mean squared error is defined by:
___(,)=1=1()2root_mean_squared_error(Ytest,Ypred)=1nsamplesi=1nsamples(ytestiypredi)2
Instruction 1.8. Compute mean absolute error and root mean squared error between the correct labels and the predictions of the test set and print these two values.
[Total marks: 8]
Hint: You might need to use Regression metrics from sklearn.
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# Compute the mean absolute error between Y_test and Y_pred
# Then, print the value
# INSERT YOUR CODE HERE
# Compute the root mean squared error between Y_test and Y_pred
# Then, print the value
# INSERT YOUR CODE HERE
Part 1D. More advanced modelling
Instruction 1.9 (D, HD) As we can see, the dataset has multi-dimensional feature vector. Previously we used only one feature to do one-dimensional linear regression model. In this task, we want to create a two-dimensional linear regression model for the label Weight. First of all, using Pearson correlation, we need to decide, which two of three independent variables would be best to use for Y vector in linear regression. Therefore, your tasks are:
- Write your code to find out which two out of three variables you will be using for linear regression. (5 marks)
- Explain your choice. (5 marks)
[Total marks: 10]
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# Write your code to find out which two out of three variable you will be using for linear regression.
# INSERT YOUR CODE HERE
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR COMMENT IN THIS CELL
# INSERT YOUR COMMENT HERE
Instruction 1.10 (D, HD) After we have decided, which features are best to use for linear regression, we need to create training and testing data, where the feature vector is two-dimensional and the label is Weight. Also, we will need to compare the two-dimensional linear regression (based on two best independent variables) with on-dimensional linear regression (based on the best independent variable). Therefore, we will need to create a dataset with one-dimensional feature vector for comparison.
Your task is:
- Create the training and testing datasets for two-dimensional linear regression. (4 marks)
- Create the one-dimensional training and testing dataset for comparison with 2D linear regression. (1 mark)
[Total marks: 5]
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
# INSERT YOUR CODE HERE
# first, compute the number of samples in the training set:
# INSERT YOUR CODE HERE
# The training set is the first n_train samples in the dataset
# INSERT YOUR CODE HERE
# INSERT YOUR CODE HERE
# The test set is the remaining samples in the dataset
# INSERT YOUR CODE HERE
# INSERT YOUR CODE HERE
# The 1D comparison dataset
# INSERT YOUR CODE HERE
# INSERT YOUR CODE HERE
# INSERT YOUR CODE HERE
Instruction 1.11 (D, HD) We have the training and testing data now. Therefore, your tasks are:
- Create a linear regression model named lr2d. (1 mark)
- Fit the training data to the model. (1 mark)
- Predict the output on the test set. (1 mark)
- Compute mean absolute error and root mean squared error between the correct labels and the predictions of the test set and print these two values. (1 mark)
[Total marks: 4]
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
lr2d =
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
Instruction 1.12 (D, HD) Now we can compare the errors for two-dimensional linear regression with one-dimensional linear regression. Your tasks are:
- Create a linear regression model named lr1d_compare. Fit the one-dimensional training data and the same labels to the model. Predict the output on the comparison dataset. (1 mark)
- Compute mean absolute error and root mean squared error between the correct labels and the predictions of the comparison set and print these two values. (1 mark)
- Discuss the findings and explain the result. In discussion, consider the relationships between the size of the dataset, the Pearson correlation coefficients you have calculated above, the Weight values and the values of errors. (4 marks)
[Total marks: 6]
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR CODES IN THIS CELL
# INSERT YOUR CODE HERE
In [ ]:
# YOU ARE REQUIRED TO INSERT YOUR COMMENT IN THIS CELL
# INSERT YOUR COMMENT HERE
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started