3 Learning DNFs with kernel perceptron Suppose that we have S ( x ( i ) , y ( i ) ) i 1 n with x ( i ) i n 0 , 1 d and y ( i ) i n 1 , 1 Let 0 , 1 d 0 , 1 be a target function which labels the points Additionally assume that is a DNF formula ( i e is a disjunction of conjunctions, or a boolean or of a bunch of boolean and s ) The fact that it labels the points simply means that 1 y ( i ) 1 ( x ( i ) ) For example, let ( x ) ( x 1 x 2 ) v v ( x 1 x 2 x 3 ) ( where x i denotes the i th entry of x ) , x ( i ) ( 1 , 0 , 1 ) T T , and x ( j ) ( 1 , 0 , 0 ) T T Then, we would have ( x ( i ) ) 1 and ( x ( j ) ) 0 , and thus y ( i ) 1 and y ( j ) 1 ( i ) Give an example target function ( make sure its a DNF formula ) and set S such that the data is not linearly separable Part ( i ) clearly shows that running the perceptron algorithm on S cannot work in general since the data does not need to be linearly separable However, we can try to use a feature transformation and the kernel trick to linearize the data and thus run the kernelized version of the perceptron algorithm on these datasets Consider the feature transformation 0 , 1 d 0 , 1 3 d which maps a vector x to the vector of all the conjunc tions of its entries or of their negations So for example if d 2 then ( note that 1 can be viewed as the empty conjunction, i e the conjunction of zero literals ) Let K 0 , 1 d 0 , 1 d R be the kernel function associated with ( i e for a , bin 0 , 1 d K ( a , b ) ( a ) ( b ) Note that the naive approach of calculating K ( a , b ) ( simply calculating ( a ) and ( b ) and taking the dot product ) takes time ( 3 d ) Also let w i n 0 , 1 3 d be such that w 1 0 5 ( this is the entry which corresponds to the empty conjunction, i e AAxin 0 , 1 d ( x ) 1 1 ) and AAi 1 w i 1 iff the i th conjunction is one of the conjunctions of So for example letting ( x ) ( x 1 x 2 ) v v ( b a r ( x 1 ) ) we would have w ( 0 5 , 0 , 0 , 1 , 0 , 1 , 0 , 0 , 0 ) T T ( ii ) Find a way to compute K ( a , b ) in O ( d ) time ( iii ) Show that w linearly separates is just a shorthand for ( ( x ( i ) ) , y ( i ) ) i 1 n ) and find a lower bound for the margin with which it separates the data Remember that m i n ( ( x ( i ) ) , y ( i ) ) i n ( S ) y i ( w w ( x ( i ) ) ) Your lower bound should depend on s , the number of conjunctions in ( iv ) Find an upper bound on the radius R of the dataset ( S ) Remember that R m a x ( ( x ( i ) ) , y ( i ) ) i n ( S ) ( x ( i ) ) ( v ) Use parts ( ii ) , ( iii ) , and ( iv ) to show that we can run kernel perceptron efficiently on this transformed space in which our data is linearly separable ( show that each iteration only takes O ( n d ) time per point ) but that unfortunately the mistake bound is very bad ( show that it is O ( s 2 d ) ) There are ways to get a better mistake bound in this same kernel space, but the running time then becomes very bad ( exponential ) It is open whether there are ways to get both polynomial mistake bound and running time

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

3 Learning DNFs with kernel perceptron Suppose that we have S = { ( x ( i ) , y ( i ) ) }

3

Learning DNFs with kernel perceptron

Suppose that we have

S = {(x^{(i)}, y^{(i)})}_{i} = 1^{n}

with

x^{(i)} i n {0, 1}^{d}

and

y^{(i)} i n {- 1, 1} .

Let

{0, 1}^{d} {0, 1}

be a

"target function" which "labels" the points. Additionally assume that

is a DNF formula

(

.

.

is a disjunction of

conjunctions, or a boolean

"

"

of a bunch of boolean "and"s

) .

The fact that it "labels" the points simply means that

1 [y^{(i)} = 1] = (x^{(i)}) .

For example, let

(x) = (x_{1}^{?^{?}} x_{2}) v v (x_{1}^{?^{?}} {\overset{}{x}}_{2}^{?^{?}} x_{3}) (

where

x_{i}

denotes the

i

th entry of

{

x), x^{(i)} = ([1, 0, 1])^{T T},

and

x^{(j)} = ([1, 0, 0])^{T T} .

Then, we would have

(x^{(i)}) = 1

and

(x^{(j)}) = 0,

and thus

y^{(i)} = 1

and

y^{(j)} = - 1 .

(

)

Give an example target function

(

make sure its a DNF formula

)

and set

S

such that the data is not linearly

separable.

Part

(

)

clearly shows that running the perceptron algorithm on

S

cannot work in general since the data does not

need to be linearly separable. However, we can try to use a feature transformation and the kernel trick to linearize the

data and thus run the kernelized version of the perceptron algorithm on these datasets.

Consider the feature transformation

{0, 1}^{d} {0, 1}^{3^{d}}

which maps a vector

x

to the vector of all the conjunc

-

tions of its entries or of their negations. So for example if

d = 2

then

(

note that

1

can be viewed as the empty conjunction, i

.

.

the conjunction of zero literals

) .

Let

K

{0, 1}^{d} {0, 1}^{d} R

be the kernel function associated with

(

.

.

for

a,

bin

{0, 1}^{d}

K (a, b) =

(a) * (b) .

Note that the naive approach of calculating

K (a, b) (

simply calculating

(a)

and

(b)

and taking the dot

product

)

takes time

(3^{d}) .

Also let

w^{* *} i n {0, 1}^{3^{d}}

be such that

w_{1}^{* *} = - 0.5 (

this is the entry which corresponds to the empty conjunction, i

.

.

{

:AAxin

{0, 1}^{d}

(x)_{1} = 1)

and AAi

> 1

w_{i}^{* *} = 1

iff the

i

th conjunction is one of the conjunctions of

.

So for example

letting

(x) = (x_{1}^{?^{?}} x_{2}) v v (\frac{?}{b a r} (x_{1}))

we would have:

w^{* *} = ([- 0.5, 0, 0, 1, 0, 1, 0, 0, 0])^{T T}

(

)

Find a way to compute

K (a, b)

O (d)

time.

(

iii

)

Show that

w^{* *}

linearly separates is just a shorthand for

{

{((x^{(i)}), y^{(i)})}_{i} = 1^{n})

and find a lower bound

for the margin

with which it separates the data. Remember that

= m i n_{((x^{(i)}), y^{(i)}) i n (S)} y_{i} (\frac{w^{* *}}{| | w^{* *} | |} * (x^{(i)})) .

Your lower bound should depend on

s,

the number of conjunctions in

.

(

)

Find an upper bound on the radius

R

of the dataset

(S) .

Remember that

R = m a x_{((x^{(i)}), y^{(i)}) i n (S)} | | (x^{(i)}) | | .

(

)

Use parts

(

), (

iii

),

and

(

)

to show that we can run kernel perceptron efficiently on this transformed space

in which our data is linearly separable

(

show that each iteration only takes

O (n d)

time per point

)

but that

unfortunately the mistake bound is very bad

(

show that it is

O (s 2^{d})) .

There are ways to get a better mistake bound in this same kernel space, but the running time then becomes very

bad

(

exponential

) .

It is open whether there are ways to get both polynomial mistake bound and running time.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Image Databases Towards Advanced Image Retrieval

Authors: Yihong Gong

1st Edition

1461375037, 978-1461375036

More Books

Students also viewed these Databases questions

Question

Which question expresses the developmental issue of stability and change? a. Are individuals more similar or different from each other? b. How much of development occurs in distinct stages? c. How...

Answered: 1 week ago

Question

★★★★★

Why is it important that your research can be related to a relevant theory base, and when during the project does the theoretical framework need to be identified? Emma was now at the start of her...

Answered: 1 week ago

Question

★★★★★

3 Learning DNFs with kernel perceptron Suppose that we have S = { ( x ( i ) , y ( i ) ) } i = 1 n with x ( i ) i n { 0 , 1 } d and y ( i ) i n { - 1 , 1 } . Let : { 0 , 1 } d { 0 , 1 } be a "target...

Answered: 1 week ago

Question

★★★★★

Say the following are deductions on a typical income, i.Pension deductions are 5% (assume there is no maximum) ii.Employment Insurance deductions are 2.4% (assume there is no maximum) iii.And Income...

Answered: 1 week ago

Question

★★★★★

3.3 You are offered the chance to play a dice game at $10 per toss of 2 dice. If the sum of the two dice tossed is 4, you will receive $100. Should you play or not? Hint - Expected Value and use a...

Answered: 1 week ago

Question

★★★★★

Answer these project management questions? What are the knowledge areas relevant to doing a project? What is RAID as it related to project management? What are the important processes for project...

Answered: 1 week ago

Question

★★★★★

J For #8-10, include units for full credit. For any optimization problems, be sure to test whether your points are mins/maxes 8. (12 pts) At 1:00 you are located 9 km south of the Ominous Floating...

Answered: 1 week ago

Question

★★★★★

Imagine you read poll results that found that 49% of individuals liked buying food at movies, while 42% of individuals did not like buying food at movies. This poll had an error of +/- 2%. Based on...

Answered: 1 week ago

Question

★★★★★

Question: What will happen if two interfaces in Java declare methods with the same signature, and a class implements both interfaces? A) The program will fail to compile due to a method conflict. B)...

Answered: 1 week ago

Question

★★★★★

Explain the importance of competitive labor market and product market forces in compensation decisions. page 470

Answered: 1 week ago

Question

★★★★★

Describe the fundamental pay programs for recognizing employees contributions to the organizations success. page 503

Answered: 1 week ago

Question

★★★★★

Specify the relationship between job satisfaction and various forms of job withdrawal, and identify the major sources of job satisfaction in work contexts. page 437

Answered: 1 week ago

Previous Question Next Question