All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Hire a Tutor
AI Study Help
New
Search
Search
Sign In
Register
study help
business
statistical techniques in business
Questions and Answers of
Statistical Techniques in Business
Let \(\mathscr{G}\) be an RKHS with reproducing kernel \(\kappa\). Show that \(\kappa\) is a positive semidefinite function.
Show that a reproducing kernel, if it exists, is unique.
Let \(\mathscr{G}\) be a Hilbert space of functions \(g: \mathscr{X} \rightarrow \mathbb{R}\). Recall that the evaluation functional is the map \(\delta_{x}: g \mapsto g(\boldsymbol{x})\) for a given
Let \(\mathscr{G}_{0}\) be the pre-RKHS \(\mathscr{G}_{0}\) constructed in the proof of Theorem 6.2. Thus, \(g \in \mathscr{G}_{0}\) is of the form \(g=\sum_{i=1}^{n} \alpha_{i} \kappa_{x_{i}}\)
Continuing Exercise 4 , let \(\left(f_{n}\right)\) be a Cauchy sequence in \(\mathscr{G}_{0}\) such that \(\left|f_{n}(\boldsymbol{x})\right| \rightarrow 0\) for all \(\boldsymbol{x}\). Show that
Continuing Exercises 5 and 4, to show that the inner product (6.14) is well defined, a number of facts have to be checked.(a) Verify that the limit converges.(b) Verify that the limit is independent
Exercises 4-6 show that \(\mathscr{G}\) defined in the proof of Theorem 6.2 is an inner product space. It remains to prove that \(\mathscr{G}\) is an RKHS. This requires us to prove that the inner
If \(\kappa_{1}\) and \(\kappa_{2}\) are kernels on \(\mathscr{X}\) and \(\mathscr{Y}\), then \(\kappa_{+},\left((\boldsymbol{x}, \boldsymbol{y}),\left(\boldsymbol{x}^{\prime},
An RKHS enjoys the following desirable smoothness property: if \(\left(g_{n}\right)\) is a sequence belonging to RKHS \(\mathscr{G}\) on \(\mathscr{X}\), and \(\left\|g_{n}-g\right\|_{\mathscr{G}}
Let \(\mathbf{X}\) be an \(\mathbb{R}^{d}\)-valued random variable that is symmetric about the origin (that is, \(\boldsymbol{X}\) and \((-\boldsymbol{X})\) are identically distributed). Denote by
Suppose an \(\mathrm{RKHS} \mathscr{G}\) of functions from \(\mathscr{X} \rightarrow \mathbb{R}\) (with kernel \(\kappa\) ) is invariant under a group \(\mathscr{T}\) of transformations \(T:
Given two Hilbert spaces \(\mathscr{H}\) and \(\mathscr{G}\), we call a mapping \(A: \mathscr{H} \rightarrow \mathscr{G}\) a Hilbert space isomorphism if it is (i) a linear map; that is, \(A(a f+b
Let \(\mathbf{X}\) be an \(n \times p\) model matrix. Show that \(\mathbf{X}^{\top} \mathbf{X}+n \gamma \mathbf{I} \mathbf{I}_{p}\) for \(\gamma>0\) is invertible.
As Example 6.8 clearly illustrates, the pdf of a random variable that is symmetric about the origin is not in general a valid reproducing kernel. Take two such iid random variables \(X\) and
For the smoothing cubic spline of Section 6.6, show that \(\kappa(x, u)=\frac{\max \{x, u\} \min \{x, u\}^{2}}{2}-\frac{\min \{x, u\}^{3}}{6} .\).
Let \(\mathbf{X}\) be an \(n \times p\) model matrix and let \(\boldsymbol{u} \in \mathbb{R}^{p}\) be the unit-length vector with \(k\)-th entry equal to one
Use Algorithm 6. 8.1 from Exercise 16 to write Python code that computes the ridge regression coefficient \(\boldsymbol{\beta}\) in (6.5) and use it to replicate the results on Figure 6.1. The
Consider Example 2.10 with \(\mathbf{D}=\operatorname{diag}\left(\lambda_{1}, \ldots, \lambda_{p}\right)\) for some nonnegative vector \(\lambda \in \mathbb{R}^{p}\), so that twice the negative
(Exercise 18 continued.) Consider again Example 2.10 with \(\mathbf{D}=\operatorname{diag}\left(\lambda_{1}, \ldots, \lambda_{p}\right)\) for some nonnegative model-selection parameter \(\lambda \in
In this exercise we explore how the early stopping of the gradient descent iterations (see Example B.10),\[ \boldsymbol{x}_{t+1}=\boldsymbol{x}_{t}-\alpha abla f\left(\boldsymbol{x}_{t}\right),
Following his mentor Francis Galton, the mathematician/statistician Karl Pearson conducted comprehensive studies comparing hereditary traits between members of the same family. Figure 5.10 depicts
For the simple linear regression model, show that the values for \(\widehat{\beta_{1}}\) and \(\widehat{\beta_{0}}\) that solve the equations (5.9) are:\[ \begin{gather*}
Edwin Hubble discovered that the universe is expanding. If \(v\) is a galaxy's recession velocity (relative to any other galaxy) and \(d\) is its distance (from that same galaxy), Hubble's law states
The multiple linear regression model (5.6) can be viewed as a first-order approximation of the general model\[ \begin{equation*} Y=g(\boldsymbol{x})+\varepsilon \tag{5.42} \end{equation*} \]where
Table 5.6 shows data from an agricultural experiment where crop yield was measured for two levels of pesticide and three levels of fertilizer. There are three responses for each combination.(a)
Show that for the birthweight data in Section 5.6.6.2 there is no significant decrease in birthweight for smoking mothers. [Hint: create a new variable nonsmoke \(=1\)-smoke, which reverses the
Prove (5.37) and (5.38).
In the Tobit regression model with normally distributed errors, the response is modeled as:\[ Y_{i}=\left\{\begin{array}{ll} Z_{i}, & \text { if } u_{i}u_{i}}
Dowload data set WomenWage.csv from the book's website. This data set is a tidied-up version of the women's wages data set from [91]. The first column of the data (hours) is the response variable
Let \(\mathbf{P}\) be a projection matrix. Show that the diagonal elements of \(\mathbf{P}\) all lie in the interval \([0,1]\). In particular, for \(\mathbf{P}=\mathbf{X X}^{+}\)in Theorem 5.1, the
Consider the linear model \(\boldsymbol{Y}=\mathbf{X} \boldsymbol{\beta}+\varepsilon\) in (5.8), with \(\mathbf{X}\) being the \(n \times p\) model matrix and \(\boldsymbol{\varepsilon}\) having
Take the linear model \(\boldsymbol{Y}=\mathbf{X} \boldsymbol{\beta}+\varepsilon\), where \(\mathbf{X}\) is an \(n \times p\) model matrix, \(\varepsilon=\mathbf{0}\), and \(\mathbb{C o
Consider a normal linear model \(\boldsymbol{Y}=\mathbf{X} \boldsymbol{\beta}+\varepsilon\), where \(\mathbf{X}\) is an \(n \times p\) model matrix and \(\varepsilon \sim \mathscr{N}\left(\mathbf{0},
Using the notation from Exercises 11-13, Cook's distance for observation \(i\) is defined as\[ D_{i}:=\frac{\widehat{\boldsymbol{Y}}-\widehat{\boldsymbol{Y}}^{(i)^{2}}}{p S^{2}} \]It measures the
Prove that if we add an additional feature to the general linear model, then \(R^{2}\), the coefficient of determination, is necessarily non-decreasing in value and hence cannot be used to compare
Let \(\boldsymbol{X}:=\left[X_{1} \ldots, X_{n}\right]^{\top}\) and \(\boldsymbol{\mu}:=\left[\mu_{1}, \ldots \mu_{n}\right]^{\top}\). In the fundamental Theorem C.9, we use the fact that if \(X_{i}
Carry out a logistic regression analysis on a (partial) wine data set classification problem. The data can be loaded using the following code.The model matrix has three features, including the
Consider again Example5.10, where we train the learner via the Newton iteration (5.39). If \(\mathbf{X}^{\top}:=\left[x_{1}, \ldots, \boldsymbol{x}_{n}\right]\) defines the matrix of predictors and
In multi-output linear regression, the response variable is a real-valued vector of dimension, say, \(m\). Similar to (5.8), the model can be written in matrix notation:\[ \mathbf{Y}=\mathbf{X
This exercise is to show that the Fisher information matrix \(\mathbf{F}(\boldsymbol{\theta})\) in (4.8) is equal to the matrix \(\mathbf{H}(\boldsymbol{\theta})\) in (4.9), in the special case where
Plot the mixture of \(\mathscr{N}(0,1), \mathscr{U}(0,1)\), and \(\operatorname{Exp}(1)\) distributions, with weights \(w_{1}=w_{2}=w_{3}\) \(=1 / 3\).
Denote the pdfs in Exercise 2 by \(f_{1}, f_{2}, f_{3}\), respectively. Suppose that \(X\) is simulated via the two-step procedure: First, draw \(Z\) from \(\{1,2,3\}\), then \(\operatorname{draw}
Simulate an iid training set of size 100 from the Gamma \((2.3,0.5)\) distribution, and implement the Fisher scoring method in Example 4.1 to find the maximum likelihood estimate. Plot the true and
Let \(\mathscr{T}=\left\{\boldsymbol{X}_{1}, \ldots, \boldsymbol{X}_{n}\right\}\) be iid data from a pdf \(g(\boldsymbol{x} \mid \boldsymbol{\theta})\) with Fisher matrix
Figure 4.15 shows a Gaussian KDE with bandwidth \(\sigma=0.2\) on the points \(-0.5,0,0.2,0.9\), and 1.5. Reproduce the plot in Python. Using the same bandwidth, plot also the KDE for the same data,
For fixed \(x^{\prime}\), the Gaussian kernel function is the solution to Fourier's heat equation \[ \frac{\partial}{\partial t} f(x \mid t)=\frac{1}{2} \frac{\partial^{2}}{\partial x^{2}} f(x \mid
Show that the Ward linkage given in (4.41) is equal to dward (I, J)= == I+I || x-x || 2
Carry out the agglomerative hierarchical clustering of Example 4.8 via the linkage method from scipy.cluster.hierarchy. Show that the linkage matrices are the same. Give a scatterplot of the data,
Suppose that we have the data \(\tau_{\mathrm{n}}=\left\{x_{1}, \ldots, x_{n}\right\}\) in \(\mathbb{R}\) and decide to train the two-component Gaussian mixture model\[ g(x \mid
A \(d\)-dimensional normal random vector \(X \sim \mathscr{N}\left(\boldsymbol{\mu}, \sum\right)\) can be defined via an affine transformation,
A generalization of both the gamma and inverse-gamma distribution is the generalized inverse-gamma distribution, which has density\[\begin{equation*}f(s)=\frac{(a / b)^{p / 2}}{2 K_{p}(\sqrt{a b})}
In Exercise 11 we viewed the multivariate Student \(\mathbf{t}_{\alpha}\) distribution as a scale-mixture of the \(\mathscr{N}\left(\mathbf{0}, \mathbf{I}_{d}\right)\) distribution. In this exercise,
Consider the ellipsoid \(E=\left\{\boldsymbol{x} \in \mathbb{R}^{d}: x \boldsymbol{\Sigma}^{-1} \boldsymbol{x}=1\right\}\) in (4.42). Let \(\mathbf{U D}^{2} \mathbf{U}^{\top}\) be an SVD of
Figure 4.13 shows how the centered "surfboard" data are projected onto the first column of the principal component matrix \(\mathbf{U}\). Suppose we project the data instead onto the plane spanned by
Figure 4.14 suggests that we can assign each feature vector \(\boldsymbol{x}\) in the iris data set to one of two clusters, based on the value of \(\boldsymbol{u}_{1}^{\top} \boldsymbol{x}\), where
We can modify the Box-Muller method in Example 3.1 to draw \(X\) and \(Y\) uniformly on the unit disc, \(\left\{(x, y) \in \mathbb{R}^{2}: x^{2}+y^{2} \leqslant 1\right\}\), in the following way:
A simple acceptance-rejection method to simulate a vector \(\boldsymbol{X}\) in the unit \(d\)-ball \(\left\{\boldsymbol{x} \in \mathbb{R}^{d}\right.\) : \(\|x\| \leqslant 1\}\) is to first generate
Let the random variable \(X\) have pdf\[ f(x)= \begin{cases}\frac{1}{2} x, & 0 \leqslant x
Construct simulation algorithms for the following distributions:(a) The weib(a, \(\lambda\) ) distribution, with cdf \(F(x)=1-\mathrm{e}^{-(\lambda x) \alpha}, x \geqslant 0\), where \(\lambda>0\)
We wish to sample from the pdf\[ f(x)=x \mathrm{e}^{-x}, \quad x \geqslant 0 \]using acceptance-rejection with the proposal pdf \(g(x)=e^{-x / 2} / 2, x \geqslant 0\).(a) Find the smallest \(C\)
Let \([X, Y]^{\top}\) be uniformly distributed on the triangle with corners \((0,0),(1,2)\), and \((-1,1)\). Give the distribution of \([U, V]^{\top}\) defined by the linear transformation\[
Explain how to generate a random variable from the extreme value distribution, which has cdf\[ F(x)=1-\mathrm{e}^{-\exp \left(\frac{x-\mu}{\sigma}\right)}, \quad-\infty
Write a program that generates and displays 100 random vectors that are uniformly distributed within the ellipse\[ 5 x^{2}+21 x y+25 y^{2}=9 \][Hint: Consider generating uniformly distributed
Suppose that \(X_{i} \sim \operatorname{Exp}\left(\lambda_{i}\right)\), independently, for all \(i=1, \ldots, n\). Let \(\boldsymbol{\Pi}=\left[\Pi_{1}, \ldots, \Pi_{n}\right]^{\top}\) be the random
Consider the Markov chain with transition graph given in Figure 3.17, starting in state 1.(a) Construct a computer program to simulate the Markov chain, and show a realization for \(N=100\) steps.(b)
As a generalization of Example C.9, consider a random walk on an arbitrary undirected connected graph with a finite vertex set \(\mathscr{V}\). For any vertex \(v \in \mathscr{V}\), let \(d(v)\) be
Let \(U, V \sim_{\text {iid }} \mathscr{U}(0,1)\). The reason why in Example 3.7 the sample mean and sample median behave very differently is that \(\mathbb{E}[U / V]=\infty\), while the median of
Consider the problem of generating samples from \(Y \sim \operatorname{Gamma}(2,10)\).(a) Direct simulation: Let \(U_{1}, U_{2} \sim\) idd \(\mathscr{U}(0,1)\). Show that \(-\ln \left(U_{1}\right) /
Let \(\boldsymbol{X}=[X, Y]^{\top}\) be a random column vector with a bivariate normal distribution with expectation vector \(\boldsymbol{\mu}=[1,2]^{\top}\) and covariance matrix\[
Here the objective is to sample from the 2-dimensional pdf\[ f(x, y)=c \mathrm{e}^{-(x y+x+y)}, \quad x \geqslant 0, \quad y \geqslant 0 \]for some normalization constant \(c\), using a Gibbs
We wish to estimate \(\mu=\int_{-2}^{2} \mathrm{e}^{-x^{2} / 2} \mathrm{~d} x=\int H(x) f(x) \mathrm{d} x\) via Monte Carlo simulation using two different approaches: (1) defining \(H(x)=4
Consider estimation of the tail probability \(\mu=\mathbb{P}[X \geqslant \gamma]\) of some random variable \(X\), where \(\gamma\) is large. The crude Monte Carlo estimator of \(\mu\) is\[
One of the test cases in [70] involves the minimization of the Hougen function. Implement a cross-entropy and a simulated annealing algorithm to carry out this optimization task.
In the binary knapsack problem, the goal is to solve the optimization problem:\[ \max _{\boldsymbol{x} \in\{0,1\}^{n}} \boldsymbol{p}^{\top} \boldsymbol{x} \]subject to the constraints \[
Let \(\left(C_{1}, R_{1}\right),\left(C_{2}, R_{2}\right), \ldots\) be a renewal reward process, with \(\mathbb{E} R_{1}
Prove Theorem 3.3
Prove that if \(H(\mathbf{x}) \geqslant 0\) the importance sampling pdf \(g^{*}\) in (3.22) gives the zero-variance importance sampling estimator \(\widehat{\mu}=\mu\).
Let \(X\) and \(Y\) be random variables (not necessarily independent) and suppose we wish to estimate the expected difference \(\mu=\mathbb{E}[X-Y]=\mathbb{E} X-\mathbb{E} Y\).(a) Show that if \(X\)
Suppose that the loss function is the piecewise linear function\[ \operatorname{Loss}(y, \hat{y})=\alpha(\hat{y}-y)_{+}+\beta(y-\hat{y})_{+}, \quad \alpha, \beta>0 \]where \(c_{+}\)is equal to
Show that, for the squared-error loss, the approximation error \(\ell\left(g^{\mathscr{C}}\right)-\ell\left(g^{*}\right)\) in (2.16), \(\begin{array}{llllll}\text { is } \quad \text { equal } & \text
Suppose \(\mathscr{G}\) is the class of linear functions. A linear function evaluated at a feature \(\boldsymbol{x}\) can be described as \(g(\boldsymbol{x})=\boldsymbol{\beta}^{\top}
Show that formula (2.24) holds for the \(0-1\) loss with \(0-1\) response.
Let \(\mathbf{X}\) be an \(n\)-dimensional normal random vector with mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), where the determinant of \(\boldsymbol{\Sigma}\)
Let \(\widehat{\boldsymbol{\beta}}=\boldsymbol{A}^{+} \boldsymbol{y}\). Using the defining properties of the pseudo-inverse, show that for any \(\boldsymbol{\beta}\) \(\in \mathbb{R}^{p}\)\[
Suppose that in the polynomial regression Example 2.1 we select the linear class of functions \(\mathscr{G}_{p}\) with \(p \geqslant 4\). Then, \(g^{*} \in \mathscr{G}_{p}\) and the approximation
Observe that the learner \(g_{\mathscr{T}}\) can be written as a linear combination of the response variable: \(g_{\mathscr{T}}(\boldsymbol{x})=\boldsymbol{x}^{\top} \mathbf{X}^{+} \boldsymbol{Y}\).
Consider again the polynomial regression Example 2.1. Use the fact that \(\mathbb{E}_{\mathbf{X}} \widehat{\boldsymbol{\beta}}=\mathbf{X}^{+} \boldsymbol{h}^{*}(\boldsymbol{u})\), where
Consider the setting of the polynomial regression in Example 2.2. Use Theorem C.19 to prove that\[ \begin{equation*} \sqrt{n}\left(\widehat{\boldsymbol{\beta}_{n}}-\boldsymbol{\beta}_{p}\right)
In Example 2.2 we saw that the statistical error can be expressed (see (2.20)) as\[ \int_{0}^{1}\left(\left[1, \ldots,
Consider again Example 2.2. The result in (2.53) suggests that \(\mathbb{E} \widehat{\boldsymbol{\beta}} \rightarrow \beta_{p}\) as \(n \rightarrow \infty\), where \(\beta_{p}\) is the solution in
For our running Example 2.2 we can use (2.53) to derive a large-sample approximation of the pointwise variance of the learner \(g_{\mathscr{T}}(\boldsymbol{x})=\boldsymbol{x}^{\top}
Let \(h: \boldsymbol{x} \mapsto \mathbb{R}\) be a convex function and let \(\boldsymbol{X}\) be a random variable. Use the subgradient definition of convexity to prove Jensen's inequality:\[
Using Jensen's inequality, show that the Kullback-Leibler divergence between probability densities \(f\) and \(g\) is always positive; that is,\[ \mathbb{E} \ln
The purpose of this exercise is to prove the following Vapnik-Chernovenkis bound: for any finite class \(\mathscr{G}\) (containing only a finite number \(|\mathscr{G}|\) of possible functions) and a
Consider the problem in Exercise 16a above. Show that\[ \left|\ell_{\mathscr{T}}\left(g_{\mathscr{T}}^{\mathscr{G}}\right)-\ell\left(g^{\mathscr{G}}\right)\right| \leqslant 2 \sup _{g \in
Show that for the normal linear model \(\boldsymbol{Y} \sim \mathscr{N}\left(\mathbf{X} \beta, \sigma^{2} \mathbf{I}_{n}\right)\), the maximum likelihood estimator of \(\sigma^{2}\) is identical to
Let \(X \sim \operatorname{Gamma}(\alpha, \lambda)\). Show that the pdf of \(Z=1 / X\) is equal to\[ \frac{\lambda^{\alpha}(z)^{-\alpha-1} \mathrm{e}^{-\lambda(z)-1}}{\Gamma(\alpha)}, \quad z>0 \]
Consider the sequence \(w_{0}, w_{1}, \ldots\),where \(w_{0}=g(\boldsymbol{\theta})\) is a non-degenerate initial guess and \(w_{t}(\boldsymbol{\theta}) \propto w_{t-1}(\boldsymbol{\theta}) g(\tau
Consider the Bayesian model for \(\tau=\left\{x_{1}, \ldots, x_{n}\right\}\) with likelihood \(g(\tau \mid \mu)\) such that \(\left(X_{1}, \ldots, X_{n} \mid \mu\right) \sim_{\text {idd }}
Consider again Example 2.8, where we have a normal model with improper prior \(g(\boldsymbol{\theta})\) \(=g\left(\mu, \sigma^{2}\right) \propto 1 / \sigma^{2}\). Show that the prior predictive pdf
Showing 4500 - 4600
of 5757
First
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Last