6.22 Anomaly detection. For this problem, consider a Hilbert space H with associated feature map : X

Question:

6.22 Anomaly detection. For this problem, consider a Hilbert space H with associated feature map : X ! H and kernel K(x; x0) = (x)  (x0).

(a) First, let us consider nding the smallest enclosing sphere for a given sample S = (x1; : : : ; xm). Let c 2 H denote the center of the sphere and let r > 0 be its radius, then clearly the following optimization problem searches for the smallest enclosing sphere:

min r>0;c2H r2 subject to: 8i 2 [m]; k(xi) ???? ck2  r2:

Show how to derive the equivalent dual optimization max Xm i=1 iK(xi; xi) ????

Xm i;j=1 i jK(xi; xj)

subject to:  0 ^

Xm i=1 i = 1 ;

and prove that the optimal solution satis es c = P i i(xi). In other words the location of the sphere only depends on points xi with non-zero coecients i. These points are analogous to the support vectors of SVM.

(b) Consider the hypothesis class H = fx 7! r2 ???? k(x) ???? ck2 : kck  ; 0 < r  Rg :
A hypothesis h 2 H can be used to detect anomalies in data, where h(x)  0 indicates a non-anomalous point and h(x) < 0 indicates an anomaly.
Show that if supx k(x)k  M, then the solution to the optimization problem in part

(a) is found in the hypothesis set H with   M and R  2M.

(c) Let D denote the distribution of non-outlier points de ne the associated expected loss R(h) = ExD[1h(x)<0] and empirical margin loss bR P S;(h) = m i=1 1 m(h(xi)) 
Pm i=1 1 m1h(xi)<. These losses measure errors caused by false-positive predictions, i.e. errors caused by incorrectly labeling a point anomalous.
i. Show that the empirical Rademacher complexity for the hypothesis class H from part

(b) can be upper bound as follows:
bR S(H) 
R2 + 2 pm + 
p Tr[K] ;
where K is the kernel matrix constructed with the sample.
ii. Prove that with probability at least 1????, the following holds for all h 2 H and  2 (0; 1]:
R(h)  bR S;(h) +
4 
R2 + 2 pm + 
p Tr[K]

+
s log log2 2 
m + 3 s log 4 
2m :

(d) Just as in the case of soft-margin SVM, we can also de ne a soft-margin objective for the smallest enclosing sphere that allows us tune the sensitivity to outliers in the training set by adjusting a regularization parameter C:
min r>0;c2H;
r2 + C Xm i=1 i subject to: 8i 2 [m]; k(xi) ???? ck2  r2 + i ^ i  0:

Show that the equivalent dual formulation of this problem is max Xm i=1 iK(xi; xi) ????
Xm i;j=1 i jK(xi; xj)
subject to: 0   C1 ^
Xm i=1 i = 1 ;
and that at the optimum we have c = Pm i=1 i(xi).

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Foundations Of Machine Learning

ISBN: 9780262351362

2nd Edition

Authors: Mehryar Mohri, Afshin Rostamizadeh

Question Posted: