Question

1 Approved Answer

Posted on Sep 25, 2024

3. For this question, check all that apply. Given the circumstances described in question 1 above, which of the following changes by itself would yield

image text in transcribed

3. For this question, check all that apply. Given the circumstances described in question 1 above, which of the following changes by itself would yield at least 2X speedup? Using a bigger battery and fan, the clock can run at 1GHz Adding a cache memory reduces both Load and Store to 8 CPI An improved design reduces the CPI for ALU instructions from 8 to 4 A new compiler reduces the number of ALU instructions from 100 to 30 A multi-cycle ALU makes ALU instructions take 10 CPI, but allows a 3ns clock period 1. Given this processor hardware design, add control states to the following to implement an average instruction (as decoded by the when below), such that avg $rd, $rs, $rt gives rd the average of the other two values. You're probably used to averaging by adding and then dividing by 2, but that actually doesn't work -- because the add can go out of range. Instead, use the algorithm that rd=((rs&rt)+((rs^rt)>>1)), which is a trick from Prof. Dietz's MAGIC algorithms page. when op() op(1) Avg Start: PCout, MARin, MEMread, Yin CONST(4), ALUadd, Zin, UNTILmfc MDRout, I Rin Zout, PCin, JUMPonop HALT /* Should end here on undecoded op */ Avg: SELrs, REGout, Yin SELrt, REGout, ALUand, Zin Zout, MDRin SELTS, REGout, Yin SELrt, REGout, ALUxor, Zin CONST(1), Yin Zout, ALUsrl, Zin MDRout, Yin Zout, ALUadd, Zin Zout, SELrd, REGin, JUMP(Start) 3. For this question, check all that apply. Given the circumstances described in question 1 above, which of the following changes by itself would yield at least 2X speedup? Using a bigger battery and fan, the clock can run at 1GHz Adding a cache memory reduces both Load and Store to 8 CPI An improved design reduces the CPI for ALU instructions from 8 to 4 A new compiler reduces the number of ALU instructions from 100 to 30 A multi-cycle ALU makes ALU instructions take 10 CPI, but allows a 3ns clock period 1. Given this processor hardware design, add control states to the following to implement an average instruction (as decoded by the when below), such that avg $rd, $rs, $rt gives rd the average of the other two values. You're probably used to averaging by adding and then dividing by 2, but that actually doesn't work -- because the add can go out of range. Instead, use the algorithm that rd=((rs&rt)+((rs^rt)>>1)), which is a trick from Prof. Dietz's MAGIC algorithms page. when op() op(1) Avg Start: PCout, MARin, MEMread, Yin CONST(4), ALUadd, Zin, UNTILmfc MDRout, I Rin Zout, PCin, JUMPonop HALT /* Should end here on undecoded op */ Avg: SELrs, REGout, Yin SELrt, REGout, ALUand, Zin Zout, MDRin SELTS, REGout, Yin SELrt, REGout, ALUxor, Zin CONST(1), Yin Zout, ALUsrl, Zin MDRout, Yin Zout, ALUadd, Zin Zout, SELrd, REGin, JUMP(Start)