Question

1 Approved Answer

Posted on Sep 22, 2024

In this problem we will show that the existence of an efficient mistake-bounded learner for a class C implies an efficient PAC learner for C.

In this problem we will show that the existence of an efficient mistake-bounded learner for a class C implies an efficient PAC learner for C.

Concretely, let C be a function class with domain X {1, 1}n and binary labels Y {1, 1}. Assume that C can be learned by algorithm/learner A with some mistake bound t. You may assume you know the value t. You may also assume that at each iteration, A runs in time polynomial in n and that A only updates its state when it gets an example wrong. The concrete goal of this problem is to create a PAC-learning algorithm, B, that can PAC- learn concept class C with respect to an arbitrary distribution D over {1, 1}n using algorithm A as a sub-routine.

In order to prove that learner B can PAC-learn concept class C, we must show that there exists a finite number of examples, m, that we can draw from D such that B produces a hypothesis whose true error is more than with probability at most . First, fix some distribution D on X, and we will assume that the examples are labeled by an unknown c C. Additionally, for a hypothesis (i.e. function) h : X Y , let err(h) = PxD[h(x) = c(x)]. Formally, we will need to bound m such that the following condition holds:

,[0,1], mN| PxD[err(B({x}m))>] xD (1) where B({x}m) denotes a hypotheses produced from B with m random draws of x from an

arbitrary distribution D. To find this m, we will first decompose it into blocks of examples of size k and make use of

results based on a single block to find the bound necessary for m that satisfies condition 1.

Note: Using the identity P[err(h) > ]+P[err(h) ] = 1, we can see that P[err(h) > ] P[err(h) ] 1 , which makes the connection to the definition of PAC-learning discussed in lecture explicit.
1. (a) Fix a single arbitrary hypothesis h : X Y produced by A and determine a lower bound on the number of examples, k, such that P[err(h) > ] . (The contrapositive view would be: with probability at least 1 , it must be the case that err(h) . Make sure this makes sense.)
2. (b) From part 5a we know that as long as a block is at least of size k, then if that block is classified correctly by a fixed arbitrary hypothesis h we can effectively upper bound the probility of the bad event (i.e. A outputs h s.t. err(h) > ) by . However, our bound must apply to every h that our algorith B could output for an arbitrary distribution D over examples. With this in mind, how large should m be so that we can bound all hypotheses that could be output? Recall that algorithm B will not know the mistake bound during its execution.
3. (c) Put everything together and fully describe (with proof) a PAC learner that is able to output a hypothesis with a true error at most with probability at least 1 , given a mistake bounded learner A. To do this you should first describe your pseudocode for algorithm B which will use A as a sub-routine (no need for minute details or code, broad

strokes will suffice). Then, prove there exists a finite number of m examples for B to PAC-learn C for all values of and by lower bounding m by a function of , , and t (i.e. finding a finite lower bound for m such that the PAC-learning requirements in 1 are satisfied). image text in transcribed

5. In this problem we will show that the existence of an efficient mistake-bounded learner for a class C implies an efficient PAC learner for C. Concretely, let C be a function class with domain X E{-1,1}" and binary labels Y E{-1,1}. Assume that C can be learned by algorithm/learner A with some mistake bound t. You may assume you know the value t. You may also assume that at each iteration, A runs in time polynomial in n and that A only updates its state when it gets an example wrong. The concrete goal of this problem is to create a PAC-learning algorithm, B, that can PAC- learn concept class C with respect to an arbitrary distribution D over {-1,1}" using algorithm A as a sub-routine. In order to prove that learner B can PAC-learn concept class C, we must show that there exists a finite number of examples, m, that we can draw from D such that B produces a hypothesis whose true error is more than e with probability at most 8. First, fix some distribution D on X, and we will assume that the examples are labeled by an unknown ce C. Additionally, for a hypothesis (i.e. function) h:X + Y, let err(h) = Pi~D[h(x) + c(x)]. Formally, we will need to bound m such that the following condition holds: V8, [0, 1], 3m E N| Po~D[err(B({x}")) > e] =D ID (1) where B({x}") denotes a hypotheses produced from B with m random draws of x from an arbitrary distribution D. To find this m, we will first decompose it into blocks of examples of size k and make use of results based on a single block to find the bound necessary for m that satisfies condition 1. Note: Using the identity P[err(h) > e] +P[err(h) e] 1-8, which makes the connection to the definition of PAC-learning discussed in lecture explicit. (a) Fix a single arbitrary hypothesis h' :X + Y produced by A and determine a lower bound on the number of examples, k, such that P[err(h') > e) ) by 8'. However, our bound must apply to every h that our algorith B could output for an arbitrary distribution D over examples. With this in mind, how large should m be so that we can bound all hypotheses that could be output? Recall that algorithm B will not know the mistake bound during it's execution. (c) Put everything together and fully describe (with proof) a PAC learner that is able to output a hypothesis with a true error at most e with probability at least 1 - 8, given a mistake bounded learner A. To do this you should first describe your pseudocode for algorithm B which will use A as a sub-routine (no need for minute details or code, broad strokes will suffice). Then, prove there exists a finite number of m examples for B to PAC-learn C for all values of 8 and e by lower bounding m by a function of , 8, and t (i.e. finding a finite lower bound for m such that the PAC-learning requirements in 1 are satisfied). 5. In this problem we will show that the existence of an efficient mistake-bounded learner for a class C implies an efficient PAC learner for C. Concretely, let C be a function class with domain X E{-1,1}" and binary labels Y E{-1,1}. Assume that C can be learned by algorithm/learner A with some mistake bound t. You may assume you know the value t. You may also assume that at each iteration, A runs in time polynomial in n and that A only updates its state when it gets an example wrong. The concrete goal of this problem is to create a PAC-learning algorithm, B, that can PAC- learn concept class C with respect to an arbitrary distribution D over {-1,1}" using algorithm A as a sub-routine. In order to prove that learner B can PAC-learn concept class C, we must show that there exists a finite number of examples, m, that we can draw from D such that B produces a hypothesis whose true error is more than e with probability at most 8. First, fix some distribution D on X, and we will assume that the examples are labeled by an unknown ce C. Additionally, for a hypothesis (i.e. function) h:X + Y, let err(h) = Pi~D[h(x) + c(x)]. Formally, we will need to bound m such that the following condition holds: V8, [0, 1], 3m E N| Po~D[err(B({x}")) > e] =D ID (1) where B({x}") denotes a hypotheses produced from B with m random draws of x from an arbitrary distribution D. To find this m, we will first decompose it into blocks of examples of size k and make use of results based on a single block to find the bound necessary for m that satisfies condition 1. Note: Using the identity P[err(h) > e] +P[err(h) e] 1-8, which makes the connection to the definition of PAC-learning discussed in lecture explicit. (a) Fix a single arbitrary hypothesis h' :X + Y produced by A and determine a lower bound on the number of examples, k, such that P[err(h') > e) ) by 8'. However, our bound must apply to every h that our algorith B could output for an arbitrary distribution D over examples. With this in mind, how large should m be so that we can bound all hypotheses that could be output? Recall that algorithm B will not know the mistake bound during it's execution. (c) Put everything together and fully describe (with proof) a PAC learner that is able to output a hypothesis with a true error at most e with probability at least 1 - 8, given a mistake bounded learner A. To do this you should first describe your pseudocode for algorithm B which will use A as a sub-routine (no need for minute details or code, broad strokes will suffice). Then, prove there exists a finite number of m examples for B to PAC-learn C for all values of 8 and e by lower bounding m by a function of , 8, and t (i.e. finding a finite lower bound for m such that the PAC-learning requirements in 1 are satisfied)