2.13 Learning with an unknown parameter. In example 2.9, we showed that the concept class of k-CNF...

Question:

2.13 Learning with an unknown parameter. In example 2.9, we showed that the concept class of k-CNF is PAC-learnable. Note, however, that the learning algorithm is given k as input. Is PAC-learning possible even when k is not provided? More generally, consider a family of concept classes fCsgs where Cs is the set of concepts in C with size at most s. Suppose we have a PAC-learning algorithm A that can be used for learning any concept class Cs when s is given.

Can we convert A into a PAC-learning algorithm B that does not require the knowledge of s? This is the main objective of this problem.
To do this, we rst introduce a method for testing a hypothesis h, with high probability. Fix > 0, > 0, and i 1 and dene the sample size n by n = 32 [i log 2 + log 2 ]. Suppose we draw an i.i.d. sample S of size n according to some unknown distribution D. We will say that a hypothesis h is accepted if it makes at most 3=4 errors on S and that it is rejected otherwise. Thus, h is accepted i bR (h) 3=4.

(a) Assume that R(h) . Use the (multiplicative) Cherno bound to show that in that case PSDn[h is accepted]
2i+1 .

(b) Assume that R(h) =2. Use the (multiplicative) Cherno bounds to show that in that case PSDn[h is rejected]
2i+1 .

(c) Algorithm B is dened as follows: we start with i = 1 and, at each round i 1, we guess the parameter size s to be es = b2(i????1)= log 2

c. We draw a sample S of size n (which depends on i) to test the hypothesis hi returned by A when it is trained with a sample of size SA(=2; 1=2; es), that is the sample complexity of A for a required precision =2, condence 1=2, and size es (we ignore the size of the representation of each example here). If hi is accepted, the algorithm stops and returns hi, otherwise it proceeds to the next iteration. Show that if at iteration i, the estimate es is larger than or equal to s, then P[hi is accepted] 3=8.

(d) Show that the probability that B does not halt after j = dlog 2 = log 8 5 e iterations with es s is at most =2.

(e) Show that for i d1 + (log2 s) log 2

e, the inequality es s holds.

(f) Show that with probability at least 1 ???? , algorithm B halts after at most j0 = d1 + (log2 s) log 2 e+j iterations and returns a hypothesis with error at most .