Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to
Question:
Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to convert a context-independent sequence into a context-dependent one. An FSMN uses the tapped delay line shown in Figure 8.17 to convert a sequence
y1, y2, , yT
(yi 2 Rn) into
ˆz1, ˆz2, , ˆzT
(ˆzi 2 Ro) through a set of bidirectional parameters
ai i = ????L + 1, , L ???? 1, L
.
a. If each ai is a vector (i.e., ai 2 Rn), estimate the computational complexity of an FSMN layer. (Note that o = n in this case.)
b. If each ai is a matrix (i.e., ai 2 Ron), estimate the computational complexity of an FSMN layer.
c. Assume n = 512, o = 64, T = 128, J = 8, L = 16; compare the total number of operations in the forward pass of one layer of such a matrix-parameterized FSMN with that of one multihead transformer in the box on page 174. How about using a vector-parameterized FSMN (assume o = 512 in this case)?
Step by Step Answer:
Machine Learning Fundamentals A Concise Introduction
ISBN: 9781108940023
1st Edition
Authors: Hui Jiang