Datacenter Networking: Map- Reduce and WSC are a powerful combination to tackle large-scale data processing; for example,
Question:
Datacenter Networking: Map- Reduce and WSC are a powerful combination to tackle large-scale data processing; for example, Google in 2008 sorted one petabyte (1 PB) of records in a little more than 6 hours using 4000 servers and 48,000 hard drives.
a. Derive disk bandwidth from Figure 6.1 and associated text. How many seconds does it take to read the data into main memory and write the sorted results back?
b. Assuming each server has two 1 Gb/sec Ethernet network interface cards (NICs) and the WSC switch infrastructure is oversubscribed by a factor of 4, how many seconds does it take to shuffle the entire dataset across 4000 servers?
c. Assuming network transfer is the performance bottleneck for petabyte sort, can you estimate what oversubscription ratio Google has in their datacenter?
d. Now let’s examine the benefits of having 10 Gb/sec Ethernet without oversubscription—for example, using a 48-port 10 Gb/sec Ethernet (as used by the 2010 Indy sort benchmark winner TritonSort). How long does it take to shuffle the 1 PB of data?
e. Compare the two approaches here:
(1) The massively scale-out approach with high network oversubscription ratio, and
(2) A relatively smallscale system with a high-bandwidth network. What are their potential bottlenecks? What are their advantages and disadvantages, in terms of scalability and TCO?
f. Sort and many important scientific computing workloads are communication heavy, while many other workloads are not. List three example workloads that do not benefit from high-speed networking. What EC2 instances would you recommend to use for these two classes of workloads?
Step by Step Answer:
Computer Architecture A Quantitative Approach
ISBN: 978-8178672663
5th edition
Authors: John L. Hennessy, David A. Patterson