Question: D. What does 'Multi-Head Attention' in the Transformer model refer to? 1. The model's ability to perform multiple tasks at once. 2. The model's ability
D. What does 'Multi-Head Attention' in the Transformer model refer to?
1. The model's ability to perform multiple tasks at once.
2. The model's ability to read data word by word.
3. The model's ability to understand context better than its predecessors.
4. The model's ability to scale up by adding more layers or attention heads.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
