Question

1 Approved Answer

Posted on Sep 25, 2024

Consider the reservation table in Figure 8.4. Suppose that the processor includes forwarding logic that is able to tell that instruction A is writing to

Consider the reservation table in Figure 8.4. Suppose that the processor includes forwarding logic that is able to tell that instruction A is writing to the same register that instruction B is reading from, and that therefore the result written by A can be forwarded directly to the ALU before the write is done. Assume the forwarding logic itself takes no time. Give the revised reservation table. How many cycles are lost to the pipeline bubble?

reached the memory stage before the PC can he updated. The instactions that follow in memory will have been fetched and will be at the decode and esvecute stages already by the time it is determined that those instructions should ot in fact be enacutod Like data harands, there are multiple lechniques fordcaling with coelharands A delayed branch simply documents the fact that the beanch will be taken some number of cyclkes after it is encountered, and leaves it upto the programmer (or compiler) to ensure that the instructions that follow the conditional branch instnaction are either harmless like no-ops) or do useful work that does not depend on whother the beanch is taken. An interlock proviades hardware to itsert pipeline bubles as meodedl jst as with data harards In the most elaborate technique, speculative esecution hwae estimutes whether the branch is likely to be taken, and begins eecuting the instructions it expects to evecute. If its expectation is not met, then it undoes any sade effects (ach as regier writes) that the speculatively executed instructions caused Except for explicit pipelines and delayed beanches, all of these tochniges introdace vari- aility in the timing of execution of an instruction seqaence. Analysis of the timing of segisner bank read egister bank wne cyde Figure 8.4: Reservation table for the pipeline shown in Figure 82 with interlocks assuming that rstrucion B reads a register that is wrtw, by rtutioh A. Lee &Seshia, Intreduction to Embedded Systems 227 8.2. PARALLELISM a program can become extremely difficult when there is a decp pipcline with elaborate forwarding and speculation. Explicit pipelines are selatively common in DSP processors, which are ofiten applied in contests where precise timing is essential Out-of-onder and speculative execution are common in peneral purpose processors whene timing matters quirements of the application and avoid processors whene the requis evel of sming Achieving high performance demands parallelism in the handware Such parallelism can take two broad Sorms, multicoee archiloctures, describad laier in Section 824, ee instruction-level parallelism (ILP, which is the subject of this section. A processoe supporting ILP is able to perform multiple indepemdent aperations in each instruction cyck. We discuss four major forms of ILP: CISC instructions, subwond parallelism, su perscalar, and VI.IW CISC Instructions A processor with complex (and typicalily, rather specialined) insaructioes is calleda CISC machine (complex instruction set computer). The p behind sach processors is distinetly different from that of RISC machines (redaced instruction set cempaters) Patterson and Ditzel, 1980) DSPs are typically CISC machines and inclade intructions specifically supporting FIR filtering (and ofien other algonithms sech as FFTs (fast Fourier transforms) and Viterbi decoding). In fact, so qualify as a DSP a processor must be able to perform FIR filering in one instrection cycle per tap Esample 8.6: The Texas Instuments TM532054s family of DSP processoes s imended to be used in power-constrained embedded applicatices that demand personal diital assistants (PDAs) The inner locp of an FIR computation (3I)is RPT numberofTaps reached the memory stage before the PC can he updated. The instactions that follow in memory will have been fetched and will be at the decode and esvecute stages already by the time it is determined that those instructions should ot in fact be enacutod Like data harands, there are multiple lechniques fordcaling with coelharands A delayed branch simply documents the fact that the beanch will be taken some number of cyclkes after it is encountered, and leaves it upto the programmer (or compiler) to ensure that the instructions that follow the conditional branch instnaction are either harmless like no-ops) or do useful work that does not depend on whother the beanch is taken. An interlock proviades hardware to itsert pipeline bubles as meodedl jst as with data harards In the most elaborate technique, speculative esecution hwae estimutes whether the branch is likely to be taken, and begins eecuting the instructions it expects to evecute. If its expectation is not met, then it undoes any sade effects (ach as regier writes) that the speculatively executed instructions caused Except for explicit pipelines and delayed beanches, all of these tochniges introdace vari- aility in the timing of execution of an instruction seqaence. Analysis of the timing of segisner bank read egister bank wne cyde Figure 8.4: Reservation table for the pipeline shown in Figure 82 with interlocks assuming that rstrucion B reads a register that is wrtw, by rtutioh A. Lee &Seshia, Intreduction to Embedded Systems 227 8.2. PARALLELISM a program can become extremely difficult when there is a decp pipcline with elaborate forwarding and speculation. Explicit pipelines are selatively common in DSP processors, which are ofiten applied in contests where precise timing is essential Out-of-onder and speculative execution are common in peneral purpose processors whene timing matters quirements of the application and avoid processors whene the requis evel of sming Achieving high performance demands parallelism in the handware Such parallelism can take two broad Sorms, multicoee archiloctures, describad laier in Section 824, ee instruction-level parallelism (ILP, which is the subject of this section. A processoe supporting ILP is able to perform multiple indepemdent aperations in each instruction cyck. We discuss four major forms of ILP: CISC instructions, subwond parallelism, su perscalar, and VI.IW CISC Instructions A processor with complex (and typicalily, rather specialined) insaructioes is calleda CISC machine (complex instruction set computer). The p behind sach processors is distinetly different from that of RISC machines (redaced instruction set cempaters) Patterson and Ditzel, 1980) DSPs are typically CISC machines and inclade intructions specifically supporting FIR filtering (and ofien other algonithms sech as FFTs (fast Fourier transforms) and Viterbi decoding). In fact, so qualify as a DSP a processor must be able to perform FIR filering in one instrection cycle per tap Esample 8.6: The Texas Instuments TM532054s family of DSP processoes s imended to be used in power-constrained embedded applicatices that demand personal diital assistants (PDAs) The inner locp of an FIR computation (3I)is RPT numberofTaps