Suppose during the construction of a decision tree we wish to specify a constant regional prediction function

Question:

Suppose during the construction of a decision tree we wish to specify a constant regional prediction function \(g^{w}\) on the region \(\mathscr{R}_{w}\), based on the training data in \(\mathscr{R}_{w}\), say \(\left\{\left(\boldsymbol{x}_{1}, y_{1}\right), \ldots,\left(\boldsymbol{x}_{k}, y_{k}\right)\right\}\). Show that \(g^{w}(\boldsymbol{x}):=k^{-1} \sum_{i=1}^{k} y_{i}\) minimizes the squared-error loss.

Fantastic news! We've Found the answer you've been seeking!