Our Expression Class We repeat here our definition of the Expr class from the last chapter. [1] class Expr(object): Abstract class representing expressions def init(self,

Our Expression Class We repeat here our definition of the Expr class from the last chapter. [1] class Expr(object): Abstract class representing expressions def __init__(self, *args) : An object is created by passing to the constructor the children" self.children - args self.value = None # The value of the expression self.child_values - None # The values of the children; useful to have self.gradient = None # This is where we will accummulate the gradient. def eval(self): Evaluates the expression. *** # First, we evaluate the children. self.child_values = (c.eval() if isinstance(s, Expr) else for c in self.children] # Then, we evaluate the expression itself. self.value = self.op(*self.child_values) return self.value def op(self): "This operator must be implemented in subclasses; it should compute self.value from self.values, thus implementing the operator at the expression node. raise Not ImplementedError() def _repr__(self): "Represents the expression as the name of the class, followed by the children, and teh value." return "{}({})".Format(self._class_ __name__ ; .join(repr(c) for c in self.children)) # Expression constructors def _add__(self, other): return Plus(self, other) def _radd_(self, other): return Plus(self, other) def __sub_(self, other): return Minus(self, other) def _rsub_(self, other): return Minus (other, self) def __mul__(self, other): return Multiply(self, other) def _rmul_(self, other): return Multiply(other, self) def truediv_(self, other): return Divide (self, other) def _rtruediv_(self, other): return Divide (other, self) def __neg__(self): return Negative(self) [2] import random import string class V(Expr): Variable." def __init__(self, value=None): super(). __init_o self.children = [] self.value = random.gauss(0, 1) if value is None else value self.name = ".join( random.choices(string.ascii_letters + string.digits, k=8)) def eval(self): return self.value def assign(self, value): self.value = value def __repr__(self): return "v(name={}, value={})".Format(self.name, self.value) [] class Plus(Expr): def op(self, x, y): return x + y class Minus (Expr): def op(self, x, y): return x. y class Multiply(Expr): def op(self, x, y): return x sy class Divide (Expr): def op(self, x, y): return x/y class Negative(Expr): def op(self, x): return -X [] VX = V(value=3) vy = V(value=2) e = VX - vy assert e.eval() == 1. Implementing autogradient The next step consists in implementing autogradient. Consider an expression e = E(x0,..., In), computed as function of its children expressions 10,..., The goal of the autogradient computation is to accumulate, in each node of the expression, the gradient of the loss with respect to the node's value. For instance, if the gradient is 2 we know that if we increase the value of the expression by A, then the value of the loss is increased by 24. We accumulate the gradient in the field self.gradient of the expression. We say accumulate the gradient, because we don't really do: self.gradient ... Rather, we have a method e.zero_gradient() that sets all gradients to 0, and we then add the gradient to this initial value of 0: : self.gradient + ... We will explain later in detail why we do so, for the moment, just accept it. Computaton of the gradient In the computation of the autogradient, the expression will receive as input the value aL/de, where L is the loss, and e the value of the expression. The quantity a L/de is the gradient of the loss with respect to the expression value. With this input, the method compute_gradient of an Expr e must do the following: It must add a L/de to the gradient self.gradient of the expression. It must call the method operator_gradient. If the expression node has children, ..., C, the method operator_gradient must return the list of derivatives of e with respect to its children 21,..., dx2) The method operator_gradient is implemented not directly in the class Expr, but for each specific operator, such as Plus, Multiply, etc: each operator knows how to compute the partial derivative of its output with respect to its children (its arguments). It must propagate to each child I; the gradient L e de x by calling the method compute_gradient of the child with argument to al [] def expr_operator_gradient(self): *This method computes the derivative of the operator at the expression node. It needs to be implemented in derived classes, such as Plus, Multiply, etc.*** raise NotimplementedError() () Expr.operator_gradient . expr_operator_gradient def expr_zero_gradient(self): unsets the gradient to e, recursively for this expression and all its children." self.gradient. for e in self.children: if isinstance(e, Expr): e.zero_gradient() Expr.zero_gradient - expr_zero_gradient def expr_compute_gradient(self, de_loss_over_de_e=1): Computes the gradient. de_loss_over_de_e is the gradient of the output. de_loss_over_de_e will be added to the gradient, and then we call for each child the method compute_gradient, with argument de_loss_over_de_e* d expression / d child. The value d expression / d child is computed by self.derivate." pass # We will write this later. Expr.compute_gradient. expr_compute_gradient [] def plus_operator_gradient(self): # If e = x + y, de / dx = 1, and de/ dy = 1 return 1, 1 Plus.operator_gradient - plus_operator_gradient def multiply_operator_gradient (self): # If e- x*y, de/ dx - y, and de/ dy - X X, y self.child_values return y, x Multiply.operator_gradient multiply_operator_gradient def variable_operator_gradient(self): # This is not really used, but it needs to be here for completeness. return None V.operator_gradient = variable_operator_gradient = Question 1 With these clarifications, we ask you to implement the compute_gradient method, which again must: add L/8e to the gradient self.gradient of the expression compute by calling the method operator_gradient of itself; propagate to each child Ey the gradient que es by calling the method compute_gradient of the child i with argument [ ] *** Exercise: Implementation of compute_gradient def expr_compute_gradient(self, de_loss_over_de_e=1): Computes the gradient. de_loss_over_de_e is the gradient of the output. de_loss_over_de_e will be added to the gradient, and then we call for each child the method compute_gradient, with argument de_loss_over_de_ed expression / d child. The valued expression / d child is computed by self.derivate." YOUR CODE HERE Expr.compute_gradient - expr_compute_gradient [ ] +++ YOUR CODE HERE ) Below are some tests. [] # Tests for compute_gradient # First, the gradient of a sun. vx = V(value=3) ( VZ - V(value-4) y - VX + V assert y.eval() -- 7 y.zero_gradient() y.compute_gradient() assert vx.gradient - 1 # Second, the gradient of a product. vx - V(value) vz = V(value=) y = VX V2 assert y.eval() -- 12 y.zero_gradient() y.compute_gradient() assert vx.gradient == 4 assert vz.gradient == 3 # Finally, the gradient of the product of suns. vx = V(value=1) V = V(value=3) VZ - Vivalue-4) y - (x + vw) (V3.3) assert y.eval() -- 28 y.zero_gradient() y.compute_gradient() assert vx.gradient -- 7 assert vz.gradient - 4 To translate this into code, we will build an expression e(x, ) involving the input x and the parameters, and an expression L = (e(x, 0) y)2 for the loss, involving x, y and 0. We will then zero all gradients via zero_gradient. Once this is done, we compute the loss L for each point (Xi, y), and then the gradient dL/via a call to compute_gradient. The gradients for all the points will be added, yielding the gradient for minimizing the total loss over all the points. Rounding up the implementation Now that we have implemented autogradient, as well as the operators Plus and Multiply, it is time to implement the remaining operators: Minus Divide (no need to worry about division by zero) . and the unary minus Negative. Question 2: Implementation of Minus, Divide, and Negative ### Exercise: Implementation of "Minus, "Divide, and "Negative def minus_operator_gradient(self): # If e = x - y, de / dx = ..., and de / dy = ... # YOUR CODE HERE = minus_operator_gradient W* def divide_operator_gradient(self): # If e = x / y, de / dx = ..., and de / dy = ... # YOUR CODE HERE Divide. operator_gradient = divide_operator_gradient def negative_operator_gradient (self): # If e = -x, de / dx = ... # YOUR CODE HERE Negative.operator_gradient = negative_operator_gradient [ ] # YOUR CODE HERE [ [] ### Tests for Minus" # Minus. vx = V(value=3) vy = V(value=2) e = vx - vy assert e.eval() == 1. e.zero_gradient() e.compute_gradient() assert vx.gradient 1 assert vy.gradient == -1 == [] ### Hidden tests for Minus # Tests for Divide # Divide. vx = V(6) vy = V(2) e = vx / vy assert e.eval() == 3 e.zero_gradient() e.compute_gradient() assert vx.gradient == 0.5 assert vy.gradient == -1.5