6.22 Anomaly detection. For this problem, consider a Hilbert space H with associated feature map : X
Question:
6.22 Anomaly detection. For this problem, consider a Hilbert space H with associated feature map : X ! H and kernel K(x; x0) = (x) (x0).
(a) First, let us consider nding the smallest enclosing sphere for a given sample S = (x1; : : : ; xm). Let c 2 H denote the center of the sphere and let r > 0 be its radius, then clearly the following optimization problem searches for the smallest enclosing sphere:
min r>0;c2H r2 subject to: 8i 2 [m]; k(xi) ???? ck2 r2:
Show how to derive the equivalent dual optimization max Xm i=1 iK(xi; xi) ????
Xm i;j=1 ijK(xi; xj)
subject to: 0 ^
Xm i=1 i = 1 ;
and prove that the optimal solution satises c = P i i(xi). In other words the location of the sphere only depends on points xi with non-zero coecients i. These points are analogous to the support vectors of SVM.
(b) Consider the hypothesis class H = fx 7! r2 ???? k(x) ???? ck2 : kck ; 0 < r Rg :
A hypothesis h 2 H can be used to detect anomalies in data, where h(x) 0 indicates a non-anomalous point and h(x) < 0 indicates an anomaly.
Show that if supx k(x)k M, then the solution to the optimization problem in part
(a) is found in the hypothesis set H with M and R 2M.
(c) Let D denote the distribution of non-outlier points dene the associated expected loss R(h) = ExD[1h(x)<0] and empirical margin loss bR P S;(h) = m i=1 1 m(h(xi))
Pm i=1 1 m1h(xi)<. These losses measure errors caused by false-positive predictions, i.e. errors caused by incorrectly labeling a point anomalous.
i. Show that the empirical Rademacher complexity for the hypothesis class H from part
(b) can be upper bound as follows:
bR S(H)
R2 + 2 pm +
p Tr[K] ;
where K is the kernel matrix constructed with the sample.
ii. Prove that with probability at least 1????, the following holds for all h 2 H and 2 (0; 1]:
R(h) bR S;(h) +
4
R2 + 2 pm +
p Tr[K]
+
s log log2 2
m + 3 s log 4
2m :
(d) Just as in the case of soft-margin SVM, we can also dene a soft-margin objective for the smallest enclosing sphere that allows us tune the sensitivity to outliers in the training set by adjusting a regularization parameter C:
min r>0;c2H;
r2 + C Xm i=1 i subject to: 8i 2 [m]; k(xi) ???? ck2 r2 + i ^ i 0:
Show that the equivalent dual formulation of this problem is max Xm i=1 iK(xi; xi) ????
Xm i;j=1 ijK(xi; xj)
subject to: 0 C1 ^
Xm i=1 i = 1 ;
and that at the optimum we have c = Pm i=1 i(xi).
Step by Step Answer:
Foundations Of Machine Learning
ISBN: 9780262351362
2nd Edition
Authors: Mehryar Mohri, Afshin Rostamizadeh