Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 12, 2024

Logistic Regression Loss Function Let's review the Logistic Regression from the very beginning. Given the dataset: We use the sigmoid function to model the probability

Logistic Regression Loss Function

Let's review the Logistic Regression from the very beginning. Given the dataset:

We use the sigmoid function to model the probability (likelihood) of a data sample belonging to the "positive class":

So we can further unify the representation (for the detail, please refer to the lecture slides) and we have:

The loss function of the logistic regression is the negative log-likelihood function of the dataset, which is defined as:

To make the form simpler, we can assume that weight w and input feature vector x are both augmented, which means:

Therefore the negative log-likelihood function becomes:

Computing the Gradient of the Loss Function

We use the gradient descent method to optimize a random weight initialization. The key step is to compute the gradient for the loss function. Here is the result:

[*optional] Here we also provide a detailed gradient calculation process for those of you who are interested in the theory.

Optimization Procedure

During the gradient descent optimization, at each iteration, we firstly compute the current loss given the dataset. Then we compute the gradient, and finally we step a little bit in the opposite direction of the gradient. We repeat this until convergence. In the code structure we provide for you, we have implemented the main procedure of the gradient descent, as shown in the following:

Instruction

Here is a list of the functions you need to implement:

sigmoid(x): sigmoid function implementation.

LogisticReg.set_param(weights, bias): API for the user to directly set the weights and bias. weights is a numpy array of shape [d, ], and bias is just a scaler (number).

logisticReg.get_param(): API to return the weights and bias, for the purpose of auto-grading.

logisticReg.compute_loss(X, t): compute the negative log-likelihood loss function for the logistic regression.

logisticReg.compute_grad(X, t): compute and return the average gradient.

logisticReg.update(grad, lr=0.001): update the weights by the gradient descent rule.

logisticReg.predict_prob(X): return the probability (likelihood) of the sample belonging to the positive class.

logisticReg.predict_prob(X, threshold=0.5): return the predicted label following the rule: if the probability p>threshold, determine t=1, otherwise t=-1.

Here is a short example of how this LogisticReg is used in the real-world application:

where we first create a Logistic object, then fit it to the training data, and calculate the accuracy on the test dataset.

Some tips:

Use Numpy library for this homework.

Use Numpy slicing and indexing techniques to set and get the weights and bias

np.hstack() for matrix augmentation.

@/.matmul()/.dot() for matrix multiplication in Numpy.

np.linalg.inv() for matrix inverse computation.

* for elementwise product of the same-shape matrix in Numpy.

Some important notes:

Read through the annotations in the .py file.

Whenever there's a pass statement, you will have to implement the code.

The implementation is auto-graded in the lesson, make sure you mark (bottom-right corner) your implementation before the deadline.

You can try multiple times until you manage to get all the credits.

Code :

import numpy as np

def sigmoid(x): # the sigmoid function pass

class LogisticReg(object): def __init__(self, indim=1): # initialize the parameters with all zeros # w: shape of [d+1, 1] pass def set_param(self, weights, bias): # helper function to set the parameters # NOTE: you need to implement this to pass the autograde. # weights: vector of shape [d, ] # bias: scaler pass def get_param(self): # helper function to return the parameters # NOTE: you need to implement this to pass the autograde. # returns: # weights: vector of shape [d, ] # bias: scaler pass

def compute_loss(self, X, t): # compute the loss # X: feature matrix of shape [N, d] # t: input label of shape [N, ] # NOTE: return the average of the log-likelihood, NOT the sum.

# extend the input matrix

# compute the loss and return the loss pass

def compute_grad(self, X, t): # X: feature matrix of shape [N, d] # grad: shape of [d, 1] # NOTE: return the average gradient, NOT the sum. pass

def update(self, grad, lr=0.001): # update the weights # by the gradient descent rule pass

def fit(self, X, t, lr=0.001, max_iters=1000, eps=1e-7): # implement the .fit() using the gradient descent method. # args: # X: input feature matrix of shape [N, d] # t: input label of shape [N, ] # lr: learning rate # max_iters: maximum number of iterations # eps: tolerance of the loss difference # TO NOTE: # extend the input features before fitting to it. # return the weight matrix of shape [indim+1, 1]

loss = 1e10 for epoch in range(max_iters): # compute the loss new_loss = self.compute_loss(X, t)

# compute the gradient grad = self.compute_grad(X, t)

# update the weight self.update(grad, lr=lr)

# decide whether to break the loop if np.abs(new_loss - loss) < eps: return self.w

def predict_prob(self, X): # implement the .predict_prob() using the parameters learned by .fit() # X: input feature matrix of shape [N, d] # NOTE: make sure you extend the feature matrix first, # the same way as what you did in .fit() method. # returns the prediction (likelihood) of shape [N, ]

pass

def predict(self, X, threshold=0.5): # implement the .predict() using the .predict_prob() method # X: input feature matrix of shape [N, d] # returns the prediction of shape [N, ], where each element is -1 or 1. # if the probability p>threshold, we determine t=1, otherwise t=-1 pass