Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to

Question:

Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to convert a context-independent sequence into a context-dependent one. An FSMN uses the tapped delay line shown in Figure 8.17 to convert a sequence

y1, y2, , yT

(yi 2 Rn) into

ˆz1, ˆz2, , ˆzT

(ˆzi 2 Ro) through a set of bidirectional parameters

ai i = ????L + 1, , L ???? 1, L

a. If each ai is a vector (i.e., ai 2 Rn), estimate the computational complexity of an FSMN layer. (Note that o = n in this case.)

b. If each ai is a matrix (i.e., ai 2 Ron), estimate the computational complexity of an FSMN layer.

c. Assume n = 512, o = 64, T = 128, J = 8, L = 16; compare the total number of operations in the forward pass of one layer of such a matrix-parameterized FSMN with that of one multihead transformer in the box on page 174. How about using a vector-parameterized FSMN (assume o = 512 in this case)?

Fantastic news! We've Found the answer you've been seeking!