Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

3 Learning DNFs with kernel perceptron Suppose that we have S = { ( x ( i ) , y ( i ) ) }

3 Learning DNFs with kernel perceptron
Suppose that we have S={(x(i),y(i))}i=1n with x(i)in{0,1}d and y(i)in{-1,1}. Let :{0,1}d{0,1} be a
"target function" which "labels" the points. Additionally assume that is a DNF formula (i.e. is a disjunction of
conjunctions, or a boolean "or" of a bunch of boolean "and"s). The fact that it "labels" the points simply means that
1[y(i)=1]=(x(i)).
For example, let (x)=(x1??x2)vv(x1??x2??x3)(where xi denotes the i th entry of {:x),x(i)=([1,0,1])TT,
and x(j)=([1,0,0])TT. Then, we would have (x(i))=1 and (x(j))=0, and thus y(i)=1 and y(j)=-1.
(i) Give an example target function (make sure its a DNF formula) and set S such that the data is not linearly
separable.
Part (i) clearly shows that running the perceptron algorithm on S cannot work in general since the data does not
need to be linearly separable. However, we can try to use a feature transformation and the kernel trick to linearize the
data and thus run the kernelized version of the perceptron algorithm on these datasets.
Consider the feature transformation :{0,1}d{0,1}3d which maps a vector x to the vector of all the conjunc-
tions of its entries or of their negations. So for example if d=2 then
(note that 1 can be viewed as the empty conjunction, i.e. the conjunction of zero literals).
Let K:{0,1}d{0,1}dR be the kernel function associated with (i.e. for a,bin{0,1}d:K(a,b)=
(a)*(b). Note that the naive approach of calculating K(a,b)(simply calculating (a) and (b) and taking the dot
product) takes time (3d).
Also let w**in{0,1}3d be such that w1**=-0.5(this is the entry which corresponds to the empty conjunction, i.e.
{:AAxin{0,1}d:(x)1=1) and AAi>1:wi**=1 iff the i th conjunction is one of the conjunctions of . So for example
letting (x)=(x1??x2)vv(?bar(x1)) we would have:
w**=([-0.5,0,0,1,0,1,0,0,0])TT
(ii) Find a way to compute K(a,b) in O(d) time.
(iii) Show that w** linearly separates is just a shorthand for {:{((x(i)),y(i))}i=1n) and find a lower bound
for the margin with which it separates the data. Remember that =min((x(i)),y(i))in(S)yi(w**||w**||*(x(i))).
Your lower bound should depend on s, the number of conjunctions in .
(iv) Find an upper bound on the radius R of the dataset (S). Remember that
R=max((x(i)),y(i))in(S)||(x(i))||.
(v) Use parts (ii),(iii), and (iv) to show that we can run kernel perceptron efficiently on this transformed space
in which our data is linearly separable (show that each iteration only takes O(nd) time per point) but that
unfortunately the mistake bound is very bad (show that it is O(s2d)).
There are ways to get a better mistake bound in this same kernel space, but the running time then becomes very
bad (exponential). It is open whether there are ways to get both polynomial mistake bound and running time.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Image Databases Towards Advanced Image Retrieval

Authors: Yihong Gong

1st Edition

1461375037, 978-1461375036

More Books

Students also viewed these Databases questions