Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to

Question:

Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to convert a context-independent sequence into a context-dependent one. An FSMN uses the tapped delay line shown in Figure 8.17 to convert a sequence



y1, y2,    , yT


(yi 2 Rn) into



ˆz1, ˆz2,    , ˆzT


(ˆzi 2 Ro) through a set of bidirectional parameters



ai i = ????L + 1,    , L ???? 1, L


.

a. If each ai is a vector (i.e., ai 2 Rn), estimate the computational complexity of an FSMN layer. (Note that o = n in this case.)

b. If each ai is a matrix (i.e., ai 2 Ron), estimate the computational complexity of an FSMN layer.

c. Assume n = 512, o = 64, T = 128, J = 8, L = 16; compare the total number of operations in the forward pass of one layer of such a matrix-parameterized FSMN with that of one multihead transformer in the box on page 174. How about using a vector-parameterized FSMN (assume o = 512 in this case)?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Question Posted: