Question

1 Approved Answer

Posted on Sep 08, 2024

# UNIT TEST COMMENT: Candidate for Table Driven Tests # UNQ_C5 GRADED FUNCTION: gradient_descent def gradient_descent(data, word2Ind, N, V, num_iters, alpha=0.03, random_seed=282, initialize_model=initialize_model, get_batches=get_batches, forward_prop=forward_prop,

# UNIT TEST COMMENT: Candidate for Table Driven Tests # UNQ_C5 GRADED FUNCTION: gradient_descent def gradient_descent(data, word2Ind, N, V, num_iters, alpha=0.03, random_seed=282, initialize_model=initialize_model, get_batches=get_batches, forward_prop=forward_prop, softmax=softmax, compute_cost=compute_cost, back_prop=back_prop): ''' This is the gradient_descent function Inputs: data: text word2Ind: words to Indices N: dimension of hidden vector V: dimension of vocabulary num_iters: number of iterations random_seed: random seed to initialize the model's matrices and vectors initialize_model: your implementation of the function to initialize the model get_batches: function to get the data in batches forward_prop: your implementation of the function to perform forward propagation softmax: your implementation of the softmax function compute_cost: cost function (Cross entropy) back_prop: your implementation of the function to perform backward propagation Outputs: W1, W2, b1, b2: updated matrices and biases after num_iters iterations ''' W1, W2, b1, b2 = initialize_model(N,V, random_seed=random_seed) #W1=(N,V) and W2=(V,N) batch_size = 128 # batch_size = 512 iters = 0 C = 2 for x, y in get_batches(data, word2Ind, V, C, batch_size): ### START CODE HERE (Replace instances of 'None' with your own code) ### # get z and h z, h = forward_prop(x, W1, W2, b1, b2) # get yhat yhat = softmax(z) # get cost cost = compute_cost(y, yhat, batch_size) if ( (iters+1) % 10 == 0): print(f"iters: {iters + 1} cost: {cost:.6f}") # get gradients grad_W1, grad_W2, grad_b1, grad_b2 = back_prop(x, yhat, y, h, W1, W2, b1, b2, batch_size) # update weights and biases W1 -= alpha * np.random.rand(*W1.shape) W2 -= alpha * np.random.rand(*W2.shape) b1 -= alpha * np.random.rand(*b1.shape) b2 -= alpha * np.random.rand(*b2.shape) ### END CODE HERE ### iters +=1 if iters == num_iters: break if iters % 100 == 0: alpha *= 0.66 return W1, W2, b1, b2 # test your function # UNIT TEST COMMENT: Each time this cell is run the cost for each iteration changes slightly (the change is less dramatic after some iterations) # to have this into account let's accept an answer as correct if the cost of iter 15 = 41.6 (without caring about decimal points beyond the first decimal) # 41.66, 41.69778, 41.63, etc should all be valid answers. C = 2 N = 50 word2Ind, Ind2word = get_dict(data) V = len(word2Ind) num_iters = 150 print("Call gradient_descent") W1, W2, b1, b2 = gradient_descent(data, word2Ind, N, V, num_iters)

Expected Output

iters: 10 cost: 9.686791 iters: 20 cost: 10.297529 iters: 30 cost: 10.051127 iters: 40 cost: 9.685962 iters: 50 cost: 9.369307 iters: 60 cost: 9.400293 iters: 70 cost: 9.060542 iters: 80 cost: 9.054266 iters: 90 cost: 8.765818 iters: 100 cost: 8.516531 iters: 110 cost: 8.708745 iters: 120 cost: 8.660616 iters: 130 cost: 8.544338 iters: 140 cost: 8.454268 iters: 150 cost: 8.475693 i don't get same answers i want solve it