Question
Priority reversal can happen when strings of varying needs synchronize on admittance to normal assets - strings of more prominent need might wind up looking
Priority reversal can happen when strings of varying needs synchronize on admittance to normal assets - strings of more prominent need might wind up looking out for strings of lesser need, prompting bothersome realtime properties. (I) Describe how this issue can be settled for mutexes utilizing need legacy. [2 marks] (ii) Describe how need legacy would should be altered to deal with peruser author locks. [2 marks] (iii) Priority reversal can likewise emerge between two strings associated with process synchronization - for instance, when one string utilizes a semaphore to flag finishing of work. For what reason could carrying out need reversal be more troublesome with process synchronization than with shared rejection? [4 marks] (iv) What might we at some point do to tackle the issue in (b)(iii)? [4 marks] 8 CST.2013.5.9 8 Concurrent and Distributed Systems (a) The ACID properties are frequently used to characterize conditional semantics. (I) Define "atomicity" as utilized in the ACID setting. [1 mark] (ii) Define "solidness" as utilized in the ACID setting. [1 mark] (b) Write-ahead logging is a normally utilized plan to achieve value-based semantics while putting away a data set on a square stockpiling gadget, like a hard circle. (I) Under what conditions, during compose ahead log recuperation, might an exchange in the UNDO at any point list be moved to the REDO list? [2 marks] (ii) Synchronously flushing commit records to circle is costly. How might we securely lessen coordinated I/O procedure on a high-throughput framework without forfeiting ACID properties? [2 marks] (iii) Describe two execution changes that could emerge from utilizing your answer for part (b)(ii). [2 marks] (c) (I) Transaction records in a compose ahead logging plan contain five fields: hTransactionID, ObjID, Operation, OldValue, NewValuei, yet putting away the total old and new qualities can consume critical measures of room. One methodology that may be utilized, for reversible activities being applied to certain information like XOR by a consistent, is to store just the steady contentions, instead of the full when information. What issues could happen because of this plan decision? [4 marks] (ii) Write-ahead logging frameworks should know the genuine on-circle area size for the compose ahead log to accurately act. A wayward circle seller chooses to rebrand its 512-byte area plates as 2K-area plates, and changes the worth announced back to the information base framework. How should this influence information base trustworthiness? [4 marks] (iii) Explain how an information base merchant who knows about the issue portrayed to some degree (c)(ii) alleviate this issue in programming, and what restrictions could there be to this methodology. [4 marks] 9 (TURN OVER) CST.2013.5.10 9 Concurrent and Distributed Systems Sun's Network File System (NFS) is the standard dispersed document framework utilized with UNIX, and has gone through a movement of adaptations (2, 3, 4) that have step by step better execution and semantics. (a) Remote strategy call (RPC) (I) Explain how Sun RPC handle byte request (endianness). [2 marks] (ii) This approach might bring about pointless work. State when this happens and how should this be kept away from. [2 marks] (b) Network File System adaptation 2 (NFSv2) and variant 3 (NFSv3) (I) A key plan premise for NFS was that the server be "stateless" as for the client. State how this affects disseminated record securing in NFSv2 and NFSv3. [2 marks] (ii) Another key plan premise for NFSv2 was the "idempotence" of RPCs; what's the significance here? [2 marks] (iii) One critical improvement in NFSv3 was the expansion of the READDIRPLUS RPC. Make sense of for what reason did this helps execution. [4 marks] (iv) NFSv3 executes what is named "near open consistency" for record information reserving: assuming client C1 keeps in touch with a document, shuts the document, and client C2 currently opens the document for read, then it should see the consequences of all composes gave by C1 before close. Be that as it may, in the event that C2 opens the record before C1 has shut it, C2 might see some, all, or none of the composes gave by C1 (and in inconsistent request). Near open consistency is accomplished through cautious utilization of coordinated RPC semantics, joined with document timestamp data piggybacked onto server answers on all RPCs that work on records. Make sense of how near open consistency permits execution to be gotten to the next level. [4 marks] (v) NFSv3 adds another RPC, ACCESS, permitting the client to appoint access control checks at document open opportunity to the server, as opposed to performing them on the client. This permits client and server security models to contrast
4 Using library functions like htonl and Unix's bcopy or Windows' CopyMemory,
implement a routine that generates the same on-the-wire representation of the
structures given in Exercise 1 as XDR does. If possible, compare the performance
of your "by-hand" encoder/decoder with the corresponding XDR routines.
5 Use XDR and htonl to encode a 1000-element array of integers. Measure and
compare the performance of each. How do these compare to a simple loop that
reads and writes a 1000-element array of integers? Perform the experiment on a
computer for which the native byte order is the same as the network byte order,
as well as on a computer for which the native byte order and the network byte
order are different.
6 Write your own implementation of htonl. Using both your own htonl and (if
little-endian hardware is available) the standard library version, run appropriate
experiments to determine how much longer it takes to byte-swap integers versus
merely copying them.
572 7 End-to-End Data
7 Give the ASN.1 encoding for the following three integers. Note that ASN.1 integers, like those in XDR, are 32 bits in length.
(a) 101
(b) 10,120
(c) 16,909,060
8 Give the ASN.1 encoding for the following three integers. Note that ASN.1 integers, like those in XDR, are 32 bits in length.
(a) 15
(b) 29,496,729
(c) 58,993,458
9 Give the big-endian and little-endian representation for the integers from
Exercise 7.
10 Give the big-endian and little-endian representation for the integers from
Exercise 8.
11 XDR is used to encode/decode the header for the SunRPC protocol illustrated
by Figure 5.20. The XDR version is determined by the RPCVersion field. What
potential difficulty does this present? Would it be possible for a new version of
XDR to switch to little-endian integer format?
12 The presentation formatting process is sometimes regarded as an autonomous
protocol layer, separate from the application. If this is so, why might including
data compression in the presentation layer be a bad idea?
13 Suppose you have a machine with a 36-bit word size. Strings are represented as
five packed 7-bit characters per word. What presentation issues on this machine
have to be addressed for it to exchange integer and string data with the rest of the
world?
14 Using the programming language of your choice that supports user-defined automatic type conversions, define a type netint and supply conversions that enable assignments and equality comparisons between ints and netints. Can a generalization of this approach solve the problem of network argument
marshalling?
Exercises 573
15 Different architectures have different conventions on bit order as well as byte
order—whether the least significant bit of a byte, for example, is bit 0 or bit 7.
[Pos81] defines (in its Appendix B) the standard network bit order. Why is bit
order then not relevant to presentation formatting?
16 Let p ≤ 1 be the fraction of machines in a network that are big-endian; the remaining 1 − p fraction are little-endian. Suppose we choose two machines at random
and send an int from one to the other. Give the average number of byte-order conversions needed for both big-endian network byte order and receiver-makes-right,
for p = 0.1, p = 0.5, and p = 0.9. Hint: The probability that both endpoints are
big-endian is p
2
; the probability that the two endpoints use different byte orders
is 2p(1 − p).
17
19
Developing your network is easy because you know more people than you think you know. Consider:
family, friends, roommates, and significant others
iSchool faculty and staff, fellow students, and alumni
past and present co-workers
neighbors
club, organization, and association members
people at the gym, the local coffee house, and neighborhood store
people in your religious community
These people are all part of your current network, professional and personal. Keep an on-going list of the names and contact information of the people in your network. Ask your contacts to introduce you to their contacts and keep your(a) Show that for any integer N there must be a string s of length N for which
length(c(s)) ≥ N; that is, no effective compression is done.B
(b) Compress some already compressed files (try compressing with the same utility
several times in sequence). What happens to the file size?
(c) Given a compression function c as in (a), give a function c ′ such that for all bit
strings s, length(c
′Each pair of Rockwell surfboards requires 3 labor hours in the fabrication department and 1.5
labor hours in finishing. The Limestone model requires 4.5 labor-hours in fabrication and 2 labor-hours in finishing. The
company operates 6 days a week. It makes a per-unit profit of $60 on the Rockwell model and $75 on the Limestone
model. Approximately 4.1 Rockwell models and 8.4 Limestone models are produced per day.Networking is not simply an information exchange between you and another person. It involves establishing relationships with people who will often become your friends and community of colleagues as you go through your career. They may be able to help you advance your career in many ways, just as you may be able to help them advance theirs. A networking contact might result in any of the following:
Inside information on what's happening in your field of interest, such as an organization's plan to expand operations or release a new product.
Job search advice specific to your field of interest, like where jobs are typically listed.
Tips on your job hunting tools (i.e. resume and/or portfolio).
Names of people to contact about possible employment or informational interviews.
Follow-up interview and possible job offer. You might wish to give models and use graphs as fitting. [4 marks] (b) What is a directing circle? Remember a chart for your response. [4 marks] (c) Describe an instrument that forestalls directing circles in Ethernet organizations. [4 marks] (d) (I) Describe and, with the guide of a model, represent the IP Time-To-Live (TTL) instrument for limiting the effect of steering circles. [2 marks] (ii) Assuming, to some degree (d)(i), an ideal execution, portray an impediment of the methodology including the side effects that may be knowledgeable about an organization subject to this burden, and a test that might distinguish the issue. [2 marks] (e) Explain the specialized and compositional contention behind the choice in IPv6 to hold header TTL however not a header checksum. [2 marks] (f ) Explain why there is uncertainty about taking care of parcels with TTL upsides of 1 and give a useful arrangement. [2 marks] 7 (TURN OVER) CST.2013.5.8 7 Concurrent and Distributed Systems (a) Deadlock is an exemplary issue in simultaneous frameworks. (I) What are the four essential circumstances for stop? [4 marks] (ii) Deadlock is many times made sense of utilizing the Dining Philosopher's Problem. In this pseudo-code, each fork is addressed by a lock: Lock forks[] = new Lock[5];//Code for every scholar (I) while (valid) { think(); lock(fork[i]); lock(fork[(i + 1) % 5]); eat(); unlock(fork[i]); unlock(fork[(i + 1) % 5]); } Partial requesting is a typical halt anticipation plot. Depict adjustments to the above code, changing just exhibit files, with the end goal that rationalists can be taken care of securely, yet in addition halt free, utilizing a halfway request. Portray a calculation that draws the quadratic Be'zier bend, utilizing straight
lines just, to inside a resistance τ . You might utilize the calculation from section (a)
also, you might accept that you as of now have a calculation for drawing a straight
line. [8 marks]
(c) Consider the control of detail in a bend that is addressed by a grouping of
many straight line portions. Portray how Douglas and P¨ucdker's calculation
can be utilized to eliminate unnecessary focuses. You might utilize the calculation from
section (a).
(a) Consider a straightforward arbitrary walk, Sn, characterized by S0 = an and Sn = Sn−1 + Xn
for n ≥ 1 where the arbitrary factors Xi (I = 1, 2, . . .) are autonomous and
indistinguishably circulated with P(Xi = 1) = p and P(Xi = −1) = 1 − p for some
consistent p with 0 ≤ p ≤ 1.
(I) Find E(Sn) and Var (Sn) as far as a, n and p. [4 marks]
(ii) Use as far as possible hypothesis to infer an inexact articulation
for P(Sn > k) for huge n. You might leave your response communicated in wording
of the conveyance work Φ(x) = P(Z ≤ x) where Z is a norm
Ordinary arbitrary variable with zero mean and unit difference. [6 marks]
(b) Consider the Gambler's ruin issue characterized as partially (a) however with the
expansion of engrossing obstructions at 0 and N where N is some sure whole number.
Determine an articulation for the likelihood of ruin (that is, being assimilated at the
zero obstruction) while beginning at position S0 = a for each a = 0, 1, . . . , N in the
t
2
5 Logic and Proof
(a) State (with legitimization) whether the accompanying recipe is satisfiable, legitimate or
not one or the other. Note that an and b are constants.
h
∀x [q(x) → r(x)] ∧ ¬r(a) ∧ ∀x [¬r(x) ∧ ¬q(a) → p(x) ∨ q(x)]i
→ p(b) ∨ r(b)
(b) Attempt to demonstrate the recipe [∃x ∀y R(x, y)] → ∃x ∀z R(x, f(z)) by goal,
with brief clarifications of each progression, including the change to statement structure.
[4 marks]
(c) Give a model for the accompanying arrangement of provisos, or demonstrate that none exists.
{¬R(x, y), ¬R(y, x)}
{R(x, f(x))}
{¬R(x, y), ¬R(y, z), R(x, z)}
The Prolog predicate perm(+In,- Out) creates all stages of the info list
In. A developer carries out perm/2 as follows:
perm([],[]).
perm(L,[H|T]) :- take(L,H,R), perm(R,T).
The predicate take(+L,- E,- R) eliminates one component (E) from the info list L and
brings together R with the rest of L. In this manner, the rundown R has one component less than L.
(a) Consider the perm/2 predicate:
(I) Explain momentarily in words the activity of the perm/2 predicate.
(ii) Provide an execution of the take/3 predicate.
(iii) Give the total grouping of replies (properly aligned)
by perm([1,2,3],A).
(b) An understudy endeavors to summon the inquiry perm(A,[1,2,3]).
(I) Explain what occurs and why. [5 marks]
(ii) Implement a predicate sameLength/2 which is valid if the two boundaries
are arrangements of a similar length.
(iii) Using sameLength/2, etc., give an execution of
safePerm/2 which creates stages no matter what the request in
which the boundaries are given: both safePerm(+In,- Out) and
safePerm(- Out,+In) ought to create all changes of In. The request
in which these changes are created isn't significant.
[4 marks] (b) list growing (don't forget to offer to reciprocate!). Opportunities to network with people arise at any time and any place. Never underestimate an opportunity to make a connection.
I tested two of these prototypes in our digital ASIC/FPGA prototyping lab, with the other two chips tested by collaborators. Together with my collaborators, the results from my silicon prototyping experience were published at top-tier chip and design-automation conferences including Hot Chips, VLSI, IEEE TCAS I and IEEE MICRO. The chips include a mixed-signal test chip and three digital ASIC test chips. Of these chips, I was the project lead for two chips (BRGTC1 in IBM 130 nm [TWS+16] and BRGTC2 in TSMC 28 nm [TJAH+18]) and Cornell University student lead for the DARPA-funded, multi-university project on developing the Celerity SoC in TSMC 16 nm [AAHA+17,DXT+18,RZAH+19]. For the DCS test chip, I helped with full-custom design and also worked on the post-silicon testing process [BTG+17]. The primary contributions of this thesis are: • A novel approach for fine-grain voltage and frequency scaling for homogeneous systems of little cores at microsecond timescales based on switched-capacitor-based integrated voltage regulators using a novel dynamic capacitance sharing technique. 9 • A novel approach for fine-grain power control for heterogeneous multicore systems at microsecond timescales specialized for task-based parallel runtimes using a set of three techniques based on balancing marginal utility. • A novel proposal for ultra-elastic CGRAs which capitalize on new opportunities in elastic CGRAs, enabling support for configurable per-tile fine-grain power control and significantly improved dataflow efficiency. • A deep design-space exploration of these ideas using a vertically integrated research methodology that in many cases extends from cycle-level modeling down to silicon prototyping. 1.5 Collaboration and Funding This thesis would not have been possible without support from all of the members of the Batten Research Group and others. My advisor Christopher Batten was a key source of inspiration and guidance, helping to transform ideas and guide them in interesting directions. The number of times I have had ideas that became many times more interesting through his intervention cannot be understated. The work on reconfigurable power distribution networks presented in Chapter 2 was an interdisciplinary project with an architecture half and a circuits half. I was the architecture lead, and the circuits portion was led by Waclaw Godycki and Professor Alyssa Apsel. Waclaw designed the SPICE-level DC-DC converters used in the project and worked with me at the architecture-circuit interface to accurately model voltage transients in my architectural cycle-level models based on gem5. Waclaw also later led the tapeout for the DCS test chip that is briefly introduced in Section 5.1. When the chip came back, Ivan Bukreyev led the post-silicon testing as well as the chip characterization paper [BTG+17]. Finally, I would also like to thank Derek Lockhart and Yunsup Lee who established the initial ASIC CAD toolflow that I later leveraged and adapted to build the 65 nm energy model used in this work. I led the asymmetry-aware work-stealing runtime work presented in Chapter 3, but Moyang Wang was integral to the success of the project. He designed the work-stealing runtime from scratch (inspired by Intel TBB). We then worked together on instrumenting the runtime to enable the new techniques. Moyang added the exception handler for the work-mugging thread swap to the 10 work-stealing runtime. Moyang also helped port a wide variety of benchmarks to our architecture including PBBS, Cilk, and PARSEC application kernels. I led the ultra-elastic CGRA work presented in Chapter 4, but a strong team of students worked with me throughout the project. Peitian Pan and Yanghui Ou led the RTL for the CGRA, helped implement special circuitry for ratiochronous clock-domain crossings, characterized energy for the tiles and CGRAs, and measured the throughput. Cheng Tan implemented the LLVM-based compiler, transformed C benchmarks into DFGs, and mapped them onto the RTL and my analytical model. This project only took shape due to the tremendous contributions from these three. I was the lead for the BRGTC1 project, but the chip would have been impossible without immense support from Moyang Wang, Bharath Sudheendra, Nagaraj Murali, Suren Jayasuriya, Shreesha Srinath, Taylor Pritchard, Robin Ying, Eric Tang, Rohan Agarwal, and Cameron Haire. Moyang was the verification lead for our PyMTL core. Bharath and Nagaraj were integral in bringing up much of the physical backend flow in Synopsys ICC, since we started the project with little real-world physical backend expertise ourselves. On that note, Suren provided invaluable advice as I worked through Calibre DRC and LVS and many other physical design topics. Shreesha wrote the bubble-sorting accelerator attached to our core using commercial high-level synthesis tools. Taylor designed the PyMTL RTL for the host interface surrounding the eight-bit asynchronous channel. Robin Ying led the design of the full-custom LVDS receiver before handing it off to me to integrate with our digital flow. Eric and Rohan designed the breakout board that pulled the pins of our (packaged) die out to headers for preliminary post-silicon validation. Finally, Cameron brought up the post-silicon validation flow at our bench using the logic analyzer, pattern generator, and DC power analyzer. I also thank Ivan Bukreyev, who helped me connect the core power trunks to the IO ring in Cadence Virtuoso just a few hours before tapeout. I would never have been able to do this in time without his willingness to help friends in a moment's notice. I was the lead for the BRGTC2 project, but the chip would not have been possible without these six people pitching in over the final month: Shunning Jiang, Khalid Al-Hawaj, Ivan Bukreyev, Berkin Ilbeyi, Tuan Ta, Lin Cheng. Shunning Jiang was a key contributor to the RTL design, singlehandedly bringing up many components including the L0 buffer, the core, the multiply/divide unit, and a great deal of PyMTL verification infrastructure. Khalid ported the shared cache. Ivan ported the synthesizable PLL from 16 nm to 28 nm (this was initially designed by Julian Puscar and Ian Galton for the Celerity project). Berkin added RTL for a custom Bloom filter accelerator. 11 Tuan led the cycle-level gem5 modeling to inform how the smart-sharing architecture should be assembled. Lin made sure that the work-stealing runtime ran on our chip in RTL simulation. I would also like to thank Shreesha Srinath for his work on smart-sharing architectures in general, and I would like to thank Moyang Wang for designing the work-stealing runtime. Finally, everyone deserves thanks for their creativity in naming our incremental design milestones after Pokémon. The Celerity project was a big multi-university effort across four universities spread across different geographical locations. I was the Cornell student lead, but in the whole project there were twenty students and faculty involved. From the Michigan team, Tutu Ajayi, Austin Rovinski, and Aporva Amarnath worked together with me on top-level SoC integration. We also worked together to bring up the 16 nm physical backend for the entire SoC (I visited Michigan for two months during the project to help out with this). Ron G. Dreslinski was an inspirational advisor. From the UCSD team, Julian Puscar and Ian Galton designed the synthesizable PLL, and Loai Salem and Patrick P. Mercier designed the digital LDOs powering the chip. Both teams handed off GDS to us along with other files for integration with the digital flow. Rajesh K. Gupta was instrumental in bringing the whole multi-university team together. From the Bespoke Silicon Group (BSG), a number of people developed the RTL for the manycore, the chip interconnection, and the Rocket infrastructure: Scott Davidson, Shaolin Xie, Anuj Rao, Ningxiao Sun, Luis Vega, Bandhav Veluri, Chun Zhao, and Michael B. Taylor. BSG also provided many physical backend scripts from older chips developed by previous generations of students for our reference.
Suppose we have a compression function c, which takes a bit string s to a compressed string c(s).
(a) Show that for any integer N there must be a string s of length N for which
length(c(s)) ≥ N; that is, no effective compression is done.
(b) Compress some already compressed files (try compressing with the same utility
several times in sequence). What happens to the file size?
(c) Given a compression function c as in (a), give a function c ′ such that for all bit
strings s, length(c
′Each pair of Rockwell surfboards requires 3 labor hours in the fabrication department and 1.5
labor hours in finishing. The Limestone model requires 4.5 labor-hours in fabrication and 2 labor-hours in finishing. The
company operates 6 days a week. It makes a per-unit profit of $60 on the Rockwell model and $75 on the Limestone
model. Approximately 4.1 Rockwell models and 8.4 Limestone models are produced per day.Networking is not simply an information exchange between you and another person. It involves establishing relationships with people who will often become your friends and community of colleagues as you go through your career. They may be able to help you advance your career in many ways, just as you may be able to help them advance theirs. A networking contact might result in any of the following:
Inside information on what's happening in your field of interest, such as an organization's plan to expand operations or release a new product.
Job search advice specific to your field of interest, like where jobs are typically listed.
Tips on your job hunting tools (i.e. resume and/or portfolio).
Names of people to contact about possible employment or informational interviews.
Follow-up interview and possible job offer. You might wish to give models and use graphs as fitting. [4 marks] (b) What is a directing circle? Remember a chart for your response. [4 marks] (c) Describe an instrument that forestalls directing circles in Ethernet organizations. [4 marks] (d) (I) Describe and, with the guide of a model, represent the IP Time-To-Live (TTL) instrument for limiting the effect of steering circles. [2 marks] (ii) Assuming, to some degree (d)(i), an ideal execution, portray an impediment of the methodology including the side effects that may be knowledgeable about an organization subject to this burden, and a test that might distinguish the issue. [2 marks] (e) Explain the specialized and compositional contention behind the choice in IPv6 to hold header TTL however not a header checksum. [2 marks] (f ) Explain why there is uncertainty about taking care of parcels with TTL upsides of 1 and give a useful arrangement. [2 marks] 7 (TURN OVER) CST.2013.5.8 7 Concurrent and Distributed Systems (a) Deadlock is an exemplary issue in simultaneous frameworks. (I) What are the four essential circumstances for stop? [4 marks] (ii) Deadlock is many times made sense of utilizing the Dining Philosopher's Problem. In this pseudo-code, each fork is addressed by a lock: Lock forks[] = new Lock[5];//Code for every scholar (I) while (valid) { think(); lock(fork[i]); lock(fork[(i + 1) % 5]); eat(); unlock(fork[i]); unlock(fork[(i + 1) % 5]); } Partial requesting is a typical halt anticipation plot. Depict adjustments to the above code, changing just exhibit files, with the end goal that rationalists can be taken care of securely, yet in addition halt free, utilizing a halfway request. Portray a calculation that draws the quadratic Be'zier bend, utilizing straight
lines just, to inside a resistance τ . You might utilize the calculation from section (a)
also, you might accept that you as of now have a calculation for drawing a straight
line. [8 marks]
(c) Consider the control of detail in a bend that is addressed by a grouping of
many straight line portions. Portray how Douglas and P¨ucdker's calculation
can be utilized to eliminate unnecessary focuses. You might utilize the calculation from
section (a).
(a) Consider a straightforward arbitrary walk, Sn, characterized by S0 = an and Sn = Sn−1 + Xn
for n ≥ 1 where the arbitrary factors Xi (I = 1, 2, . . .) are autonomous and
indistinguishably circulated with P(Xi = 1) = p and P(Xi = −1) = 1 − p for some
consistent p with 0 ≤ p ≤ 1.
(I) Find E(Sn) and Var (Sn) as far as a, n and p. [4 marks]
(ii) Use as far as possible hypothesis to infer an inexact articulation
for P(Sn > k) for huge n. You might leave your response communicated in wording
of the conveyance work Φ(x) = P(Z ≤ x) where Z is a norm
Ordinary arbitrary variable with zero mean and unit difference. [6 marks]
(b) Consider the Gambler's ruin issue characterized as partially (a) however with the
expansion of engrossing obstructions at 0 and N where N is some sure whole number.
Determine an articulation for the likelihood of ruin (that is, being assimilated at the
zero obstruction) while beginning at position S0 = a for each a = 0, 1, . . . , N in the
t
2
5 Logic and Proof
(a) State (with legitimization) whether the accompanying recipe is satisfiable, legitimate or
not one or the other. Note that an and b are constants.
h
∀x [q(x) → r(x)] ∧ ¬r(a) ∧ ∀x [¬r(x) ∧ ¬q(a) → p(x) ∨ q(x)]i
→ p(b) ∨ r(b)
(b) Attempt to demonstrate the recipe [∃x ∀y R(x, y)] → ∃x ∀z R(x, f(z)) by goal,
with brief clarifications of each progression, including the change to statement structure.
[4 marks]
(c) Give a model for the accompanying arrangement of provisos, or demonstrate that none exists.
{¬R(x, y), ¬R(y, x)}
{R(x, f(x))}
{¬R(x, y), ¬R(y, z), R(x, z)}
The Prolog predicate perm(+In,- Out) creates all stages of the info list
In. A developer carries out perm/2 as follows:
perm([],[]).
perm(L,[H|T]) :- take(L,H,R), perm(R,T).
The predicate take(+L,- E,- R) eliminates one component (E) from the info list L and
brings together R with the rest of L. In this manner, the rundown R has one component less than L.
(a) Consider the perm/2 predicate:
(I) Explain momentarily in words the activity of the perm/2 predicate.
(ii) Provide an execution of the take/3 predicate.
(iii) Give the total grouping of replies (properly aligned)
by perm([1,2,3],A).
(b) An understudy endeavors to summon the inquiry perm(A,[1,2,3]).
(I) Explain what occurs and why. [5 marks]
(ii) Implement a predicate sameLength/2 which is valid if the two boundaries
are arrangements of a similar length.
(iii) Using sameLength/2, etc., give an execution of
safePerm/2 which creates stages no matter what the request in
which the boundaries are given: both safePerm(+In,- Out) and
safePerm(- Out,+In) ought to create all changes of In. The request
in which these changes are created isn't significant.
[4 marks] (b) Priority reversal can happen when strings of varying needs synchronize on admittance to normal assets - strings of more prominent need might wind up looking out for strings of lesser need, prompting bothersome realtime properties. (I) Describe how this issue can be settled for mutexes utilizing need legacy. [2 marks]knowledgeable about an organization subject to this burden, and a test that might distinguish the issue. [2 marks] (e) Explain the specialized and compositional contention behind the choice in IPv6 to hold header TTL however not a header checksum. [2 marks] (f ) Explain why there is uncertainty about taking care of parcels with TTL upsides of 1 and give a useful arrangement. [2 marks] 7 (TURN OVER) CST.2013.5.8 7 Concurrent and Distributed Systems (a) Deadlock is an exemplary issue in simultaneous frameworks. (I) What are the four essential circumstances for stop? [4 marks] (ii) Deadlock is many times made sense of utilizing the Dining Philosopher's Problem. In this pseudo-code, each fork is addressed by a lock: Lock forks[] = new Lock[5];//Code for every scholar (I) while (valid) { think(); lock(fork[i]); lock(fork[(i + 1) % 5]); eat(); unlock(fork[i]); unlock(fork[(i + 1) % 5]); } Partial requesting is a typical halt anticipation plot. Depict adjustments to the above code, changing just exhibit files, with the end goal that rationalists can be taken care of securely, yet in addition halt free, utilizing a halfway request. Portray a calculation that draws the quadratic Be'zier bend, utilizing straight
lines just, to inside a resistance τ . You might utilize the calculation from section (a)
also, you might accept that you as of now have a calculation for drawing a straight
line. [8 marks]
(c) Consider the control of detail in a bend that is addressed by a grouping of
many straight line portions. Portray how Douglas and P¨ (ii) Describe how need legacy would should be altered to deal with peruser author locks. [2 marks] (iii) Priority reversal can likewise emerge between two strings associated with process synchronization - for instance, when one string utilizes a semaphore to flag finishing of work. For what reason could carrying out need reversal be more troublesome with process synchronization than with shared rejection? [4 marks] (iv) What might we at some point do to tackle the issue in (b)(iii)? [4 marks] 8 CST.2013.5.9 8 Concurrent and Distributed Systems (a) The ACID properties are frequently used to characterize conditional semantics. (I) Define "atomicity" as utilized in the ACID setting. [1 mark] (ii) Define "solidness" as utilized in the ACID setting. [1 mark] (b) Write-ahead logging is a normally utilized plan to achieve value-based semantics while putting away a data set on a square stockpiling gadget, like a hard circle. (I) Under what conditions, during compose ahead log recuperation, might an exchange in the UNDO at any point list be moved to the REDO list? [2 marks] (ii) Synchronously flushing commit records to circle is costly. How might we securely lessen coordinated I/O procedure on a high-throughput framework without forfeiting ACID properties? [2 marks] (iii) Describe two execution changes that could emerge from utilizing your answer for part (b)(ii). [2 marks] (c) (I) Transaction records in a compose ahead logging plan contain five fields: hTransactionID, ObjID, Operation, OldValue, NewValuei, yet putting away the total old and new qualities can consume critical measures of room. One methodology that may be utilized, for reversible activities being applied to certain information like XOR by a consistent, is to store just the steady contentions, instead of the full when information. What issues could happen because of this plan decision? [4 marks] (ii) Write-ahead logging frameworks should know the genuine on-circle area size for the compose ahead log to accurately act. A wayward circle seller chooses to rebrand its 512-byte area plates as 2K-area plates, and changes the worth announced back to the information base framework. How should this influence information base trustworthiness? [4 marks] (iii) Explain how an information base merchant who knows about the issue portrayed to some degree (c)(ii) alleviate this issue in programming, and what restrictions could there be to this methodology. [4 marks] 9 (TURN OVER) CST.2013.5.10 9 Concurrent and Distributed Systems Sun's Network File System (NFS) is the standard dispersed document framework utilized with UNIX, and has gone through a movement of adaptations (2, 3, 4) that have step by step better execution and semantics. (a) Remote strategy call (RPC) (I) Explain how Sun RPC handle byte request (endianness). [2 marks] (ii) This approach might bring about pointless work. State when this happens and how should this be kept away from. [2 marks] (b) Network File System adaptation 2 (NFSv2) and variant 3 (NFSv3) (I) A key plan premise for NFS was that the server be "stateless" as for the client. State how this affects disseminated record securing in NFSv2 and NFSv3. [2 marks] (ii) Another key plan premise for NFSv2 was the "idempotence" of RPCs; what's the significance here? [2 marks] (iii) One critical improvement in NFSv3 was the expansion of the READDIRPLUS RPC. Make sense of for what reason did this helps execution. [4 marks] (iv) NFSv3 executes what is named "near open consistency" for record information reserving: assuming client C1 keeps in touch with a document, shuts the document, and client C2 currently opens the document for read, then it should see the consequences of all composes gave by C1 before close. Be that as it may, in the event that C2 opens the record before C1 has shut it, C2 might see some, all, or none of the composes gave by C1 (and in inconsistent request). Near open consistency is accomplished through cautious utilization of coordinated RPC semantics, joined with document timestamp data piggybacked onto server answers on all RPCs that work on records. Make sense of how near open consistency permits execution to be gotten to the next level. [4 marks] (v) NFSv3 adds another RPC, ACCESS, permitting the client to appoint access control checks at document open opportunity to the server, as opposed to performing them on the client. This permits client and server security models to contrast
4 Using library functions like htonl and Unix's bcopy or Windows' CopyMemory,
implement a routine that generates the same on-the-wire representation of the
structures given in Exercise 1 as XDR does. If possible, compare the performance
of your "by-hand" encoder/decoder with the corresponding XDR routines.
5 Use XDR and htonl to encode a 1000-element array of integers. Measure and
compare the performance of each. How do these compare to a simple loop that
reads and writes a 1000-element array of integers? Perform the experiment on a
computer for which the native byte order is the same as the network byte order,
as well as on a computer for which the native byte order and the network byte
order are different.
6 Write your own implementation of htonl. Using both your own htonl and (if
little-endian hardware is available) the standard library version, run appropriate
experiments to determine how much longer it takes to byte-swap integers versus
merely copying them.
572 7 End-to-End Data
7 Give the ASN.1 encoding for the following three integers. Note that ASN.1 integers, like those in XDR, are 32 bits in length.
(a) 101
(b) 10,120
(c) 16,909,060
8 Give the ASN.1 encoding for the following three integers. Note that ASN.1 integers, like those in XDR, are 32 bits in length.
(a) 15
(b) 29,496,729
(c) 58,993,458
9 Give the big-endian and little-endian representation for the integers from
Exercise 7.
10 Give the big-endian and little-endian representation for the integers from
Exercise 8.
11 XDR is used to encode/decode the header for the SunRPC protocol illustrated
by Figure 5.20. The XDR version is determined by the RPCVersion field. What
potential difficulty does this present? Would it be possible for a new version of
XDR to switch to little-endian integer format?
12 The presentation formatting process is sometimes regarded as an autonomous
protocol layer, separate from the application. If this is so, why might including
data compression in the presentation layer be a bad idea?
13 Suppose you have a machine with a 36-bit word size. Strings are represented as
five packed 7-bit characters per word. What presentation issues on this machine
have to be addressed for it to exchange integer and string data with the rest of the
world?
14 Using the programming language of your choice that supports user-defined automatic type conversions, define a type netint and supply conversions that enable assignments and equality comparisons between ints and netints. Can a generalization of this approach solve the problem of network argument
marshalling?
Exercises 573
15 Different architectures have different conventions on bit order as well as byte
order—whether the least significant bit of a byte, for example, is bit 0 or bit 7.
[Pos81] defines (in its Appendix B) the standard network bit order. Why is bit
order then not relevant to presentation formatting?
16 Let p ≤ 1 be the fraction of machines in a network that are big-endian; the remaining 1 − p fraction are little-endian. Suppose we choose two machines at random
and send an int from one to the other. Give the average number of byte-order conversions needed for both big-endian network byte order and receiver-makes-right,
for p = 0.1, p = 0.5, and p = 0.9. Hint: The probability that both endpoints are
big-endian is p
2
; the probability that the two endpoints use different byte orders
is 2p(1 − p).
17
19
Developing your network is easy because you know more people than you think you know. Consider:
family, friends, roommates, and significant others
iSchool faculty and staff, fellow students, and alumni
past and present co-workers
neighbors
club, organization, and association members
people at the gym, the local coffee house, and neighborhood store
people in your religious community
These people are all part of your current network, professional and personal. Keep an on-going list of the names and contact information of the people in your network. Ask your contacts to introduce you to their contacts and keep your list growing (don't forget to offer to reciprocate!). Opportunities to network with people arise at any time and any place. Never underestimate an opportunity to make a connection.
I tested two of these prototypes in our digital ASIC/FPGA prototyping lab, with the other two chips tested by collaborators. Together with my collaborators, the results from my silicon prototyping experience were published at top-tier chip and design-automation conferences including Hot Chips, VLSI, IEEE TCAS I and IEEE MICRO. The chips include a mixed-signal test chip and three digital ASIC test chips. Of these chips, I was the project lead for two chips (BRGTC1 in IBM 130 nm [TWS+16] and BRGTC2 in TSMC 28 nm [TJAH+18]) and Cornell University student lead for the DARPA-funded, multi-university project on developing the Celerity SoC in TSMC 16 nm [AAHA+17,DXT+18,RZAH+19]. For the DCS test chip, I helped with full-custom design and also worked on the post-silicon testing process [BTG+17]. The primary contributions of this thesis are: • A novel approach for fine-grain voltage and frequency scaling for homogeneous systems of little cores at microsecond timescales based on switched-capacitor-based integrated voltage regulators using a novel dynamic capacitance sharing technique. 9 • A novel approach for fine-grain power control for heterogeneous multicore systems at microsecond timescales specialized for task-based parallel runtimes using a set of three techniques based on balancing marginal utility. • A novel proposal for ultra-elastic CGRAs which capitalize on new opportunities in elastic CGRAs, enabling support for configurable per-tile fine-grain power control and significantly improved dataflow efficiency. • A deep design-space exploration of these ideas using a vertically integrated research methodology that in many cases extends from cycle-level modeling down to silicon prototyping. 1.5 Collaboration and Funding This thesis would not have been possible without support from all of the members of the Batten Research Group and others. My advisor Christopher Batten was a key source of inspiration and guidance, helping to transform ideas and guide them in interesting directions. The number of times I have had ideas that became many times more interesting through his intervention cannot be understated. The work on reconfigurable power distribution networks presented in Chapter 2 was an interdisciplinary project with an architecture half and a circuits half. I was the architecture lead, and the circuits portion was led by Waclaw Godycki and Professor Alyssa Apsel. Waclaw designed the SPICE-level DC-DC converters used in the project and worked with me at the architecture-circuit interface to accurately model voltage transients in my architectural cycle-level models based on gem5. Waclaw also later led the tapeout for the DCS test chip that is briefly introduced in Section 5.1. When the chip came back, Ivan Bukreyev led the post-silicon testing as well as the chip characterization paper [BTG+17]. Finally, I would also like to thank Derek Lockhart and Yunsup Lee who established the initial ASIC CAD toolflow that I later leveraged and adapted to build the 65 nm energy model used in this work. I led the asymmetry-aware work-stealing runtime work presented in Chapter 3, but Moyang Wang was integral to the success of the project. He designed the work-stealing runtime from scratch (inspired by Intel TBB). We then worked together on instrumenting the runtime to enable the new techniques. Moyang added the exception handler for the work-mugging thread swap to the 10 work-stealing runtime. Moyang also helped port a wide variety of benchmarks to our architecture including PBBS, Cilk, and PARSEC application kernels. I led the ultra-elastic CGRA work presented in Chapter 4, but a strong team of students worked with me throughout the project. Peitian Pan and Yanghui Ou led the RTL for the CGRA, helped implement special circuitry for ratiochronous clock-domain crossings, characterized energy for the tiles and CGRAs, and measured the throughput. Cheng Tan implemented the LLVM-based compiler, transformed C benchmarks into DFGs, and mapped them onto the RTL and my analytical model. This project only took shape due to the tremendous contributions from these three. I was the lead for the BRGTC1 project, but the chip would have been impossible without immense support from Moyang Wang, Bharath Sudheendra, Nagaraj Murali, Suren Jayasuriya, Shreesha Srinath, Taylor Pritchard, Robin Ying, Eric Tang, Rohan Agarwal, and Cameron Haire. Moyang was the verification lead for our PyMTL core. Bharath and Nagaraj were integral in bringing up much of the physical backend flow in Synopsys ICC, since we started the project with little real-world physical backend expertise ourselves. On that note, Suren provided invaluable advice as I worked through Calibre DRC and LVS and many other physical design topics. Shreesha wrote the bubble-sorting accelerator attached to our core using commercial high-level synthesis tools. Taylor designed the PyMTL RTL for the host interface surrounding the eight-bit asynchronous channel. Robin Ying led the design of the full-custom LVDS receiver before handing it off to me to integrate with our digital flow. Eric and Rohan designed the breakout board that pulled the pins of our (packaged) die out to headers for preliminary post-silicon validation. Finally, Cameron brought up the post-silicon validation flow at our bench using the logic analyzer, pattern generator, and DC power analyzer. I also thank Ivan Bukreyev, who helped me connect the core power trunks to the IO ring in Cadence Virtuoso just a few hours before tapeout. I would never have been able to do this in time without his willingness to help friends in a moment's notice. I was the lead for the BRGTC2 project, but the chip would not have been possible without these six people pitching in over the final month: Shunning Jiang, Khalid Al-Hawaj, Ivan Bukreyev, Berkin Ilbeyi, Tuan Ta, Lin Cheng. Shunning Jiang was a key contributor to the RTL design, singlehandedly bringing up many components including the L0 buffer, the core, the multiply/divide unit, and a great deal of PyMTL verification infrastructure. Khalid ported the shared cache. Ivan ported the synthesizable PLL from 16 nm to 28 nm (this was initially designed by Julian Puscar and Ian Galton for the Celerity project). Berkin added RTL for a custom Bloom filter accelerator. 11 Tuan led the cycle-level gem5 modeling to inform how the smart-sharing architecture should be assembled. Lin made sure that the work-stealing runtime ran on our chip in RTL simulation. I would also like to thank Shreesha Srinath for his work on smart-sharing architectures in general, and I would like to thank Moyang Wang for designing the work-stealing runtime. Finally, everyone deserves thanks for their creativity in naming our incremental design milestones after Pokémon. The Celerity project was a big multi-university effort across four universities spread across different geographical locations. I was the Cornell student lead, but in the whole project there were twenty students and faculty involved. From the Michigan team, Tutu Ajayi, Austin Rovinski, and Aporva Amarnath worked together with me on top-level SoC integration. We also worked together to bring up the 16 nm physical backend for the entire SoC (I visited Michigan for two months during the project to help out with this). Ron G. Dreslinski was an inspirational advisor. From the UCSD team, Julian Puscar and Ian Galton designed the synthesizable PLL, and Loai Salem and Patrick P. Mercier designed the digital LDOs powering the chip. Both teams handed off GDS to us along with other files for integration with the digital flow. Rajesh K. Gupta was instrumental in bringing the whole multi-university team together. From the Bespoke Silicon Group (BSG), a number of people developed the RTL for the manycore, the chip interconnection, and the Rocket infrastructure: Scott Davidson, Shaolin Xie, Anuj Rao, Ningxiao Sun, Luis Vega, Bandhav Veluri, Chun Zhao, and Michael B. Taylor. BSG also provided many physical backend scripts from older chips developed by previous generations of students for our reference.
Monolithic integration using a standard CMOS process provides a tremendous cost incentive for including more and more functionality on a single die. This system-on-chip (SoC) integration enables both low-power embedded platforms and high-performance processors to include a diverse array of components such as processing engines, accelerators, embedded flash memories, and external peripheral interfaces. Almost every computing system requires closed-loop voltage regulators that, at first glance, seem like another likely target for monolithic integration. These regulators convert the noisy voltage levels available from the system's environment into the multiple fixed or adjustable voltage levels required by the system, and they are usually based on efficient switch-mode circuits. These regulators have traditionally been implemented off-chip for two key reasons: (1) limited availability of high-speed switching with suitable parasitic losses; and (2) limited availability of integrated energy-storage elements with suitable energy densities. The economic pressure towards monolithic integration has simply not outweighed the potential reduction in efficiency. Recent technology trends suggest that we are entering a new era where it is now becoming feasible to reduce system cost by integrating switching regulators on-chip. High-speed switching efficiencies have increased with technology scaling, reducing the need for very high-density inductors and capacitors.
Monolithic integration using a standard CMOS process provides a tremendous cost incentive for including more and more functionality on a single die. This system-on-chip (SoC) integration enables both low-power embedded platforms and high-performance processors to include a diverse array of components such as processing engines, accelerators, embedded flash memories, and external peripheral interfaces. Almost every computing system requires closed-loop voltage regulators that, at first glance, seem like another likely target for monolithic integration. These regulators convert the noisy voltage levels available from the system's environment into the multiple fixed or adjustable voltage levels required by the system, and they are usually based on efficient switch-mode circuits. These regulators have traditionally been implemented off-chip for two key reasons: (1) limited availability of high-speed switching with suitable parasitic losses; and (2) limited availability of integrated energy-storage elements with suitable energy densities. The economic pressure towards monolithic integration has simply not outweighed the potential reduction in efficiency. Recent technology trends suggest that we are entering a new era where it is now becoming feasible to reduce system cost by integrating switching regulators on-chip. High-speed switching efficiencies have increased with technology scaling, reducing the need for very high-density inductors and capacitors
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started