Consider a case in which each of the eight cores on a Pixel Visual Core device is
Question:
Consider a case in which each of the eight cores on a Pixel Visual Core device is connected through a four-port switch to a 2D SRAM, forming a core+memory unit. The remaining two ports on the switch link these units in a ring, so that each core is able to access any of the eight SRAMs. However, this ring-based network-on-chip topology makes some data access patterns more efficient than others.
a. Suppose that each link in the NOC has the same bandwidth B, and that each link is full-duplex, so it can simultaneously transfer bandwidth B in each direction. Links connect the core to the switch, the switch to SRAM, and pairs of switches in the ring. Assume that each local memory has at least B bandwidth, so it can saturate its link. Consider a memory access pattern where each of the eight PEs access only the closest memory (the one connected via the switch of the core+memory unit). What is the maximum memory bandwidth that the core will be able to achieve?
b. Now consider an off-by-one access pattern, where core i accesses memory i+1, going through three links to reach that memory (core 7 will access memory 0, because of the ring topology). What is the maximum memory bandwidth that the core will be able to achieve in this case? To achieve that bandwidth, do you need to make any assumptions about the capabilities of the 4-port switch?
What if the switch can only move data at rate B?
c. Consider an off-by-two access pattern, where core i access memory i+2. Once again, what is the maximum memory bandwidth that the core will be able to achieve in this case? Where are the bottleneck links in the network-on-chip?
d. Consider a uniform random memory access pattern, where each core uses each of the SRAMs for ⅛ of its memory requests. Assuming this traffic pattern, how much traffic traverses a switch-to-switch link, compared to the amount of traffic between a core and its associated switch or between an SRAM and its associated switch?
e. (advanced) Can you conceive of a case (workload) where this network can deadlock? From the standpoint of software-only solutions, what should the compiler do to avoid such a scenario? If you can make changes to hardware, what changes in routing topology (and routing scheme) would guarantee no deadlocks?
Step by Step Answer:
Computer Architecture A Quantitative Approach
ISBN: 9780128119051
6th Edition
Authors: John L. Hennessy, David A. Patterson