6.19 Sequence kernels. Let X = fa; c; g; tg. To classify DNA sequences using SVMs, we...
Question:
6.19 Sequence kernels. Let X = fa; c; g; tg. To classify DNA sequences using SVMs, we wish to dene a kernel between sequences dened over X. We are given a nite set I X of non-coding regions (introns). For x 2 X, denote by jxj the length of x and by F(x) the set of factors of x, i.e., the set of subsequences of x with contiguous symbols. For any two strings x; y 2 X dene K(x; y) by K(x; y) =
X z 2(F(x)\F(y))????I
jzj; (6.32)
where 1 is a real number.
(a) Show that K is a rational kernel and that it is positive denite symmetric.
(b) Give the time and space complexity of the computation of K(x; y) with respect to the size s of a minimal automaton representing X ???? I.
(c) Long common factors between x and y of length greater than or equal to n are likely to be important coding regions (exons). Modify the kernel K to assign weight jzj 2 to z when jzj n, jzj 1 otherwise, where 1 1 2.
Show that the resulting kernel is still positive denite symmetric.
Step by Step Answer:
Foundations Of Machine Learning
ISBN: 9780262351362
2nd Edition
Authors: Mehryar Mohri, Afshin Rostamizadeh