Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This assignment is to demonstrate your understanding of the parallel algorithms discussed in class. Each of these parallel algorithms consists of a number of computation

This assignment is to demonstrate your understanding of the parallel algorithms discussed in class. Each of these parallel algorithms consists of a number of computation phases. A new phase cannot start until every processing node completes the current phase. We discussed a PowerPoint slide in class how to estimate the computation cost for the Hash Phase of the Grace Algorithm. We clarified that this computation cost for a given phase is determined by the slowest processing node (last to finish its workload), not the sum of the local computation costs of all the processing nodes. The estimation of the other phases of this and other multi-phase algorithms is the same. You need to understand each phase of these algorithms in order to do such estimation. Do the following exercises to demonstrate your knowledge of these techniques.
(30 pts.) We apply the GRACE algorithm to perform R||>|S| on a small sharednothing system with four processing nodes (PNs).R has 4,000 pages and S has 8,000 pages. Each relation is evenly divided among the four PNs. Thus, each PN has 3,000 pages of tuples. The Hashing Phase results in data skew as follows:
50% of the data in the first 8 bucket pairs: R0S0-R7S7
20% of the data in the second 8 bucket pairs: R8S8-R15S15
15% of the data in the third 8 bucket pairs: R16S16-R23S23
15% of the data in the fourth 8 bucket pairs: R24S24-R31S31
For each of the three parallel phases, estimate the read cost, the write cost, and the total computation cost. Show and explain the derivation of your mathematical analysis.
(70 pts.) We apply the ABJ algorithm for the same setting described in Question 1. Estimate the read cost, the write cost, and the total computation cost for each of the four phases of ABJ. Show and explain the derivation of your mathematical analysis.
(0 pts.) If we apply the TIJ algorithm to compute R||>|S| as described in Question 1, Estimate the total computation cost for this strategy.
Answer:
Hash/TI Phase: Since each PN reads 3000 pages, the read cost is 3000 IOs. The Hashing/TI computation does not incur disk access. It evenly sends 3000 pages to each of the PNs, the write cost is 3000 IOs. The total computation cost for this phase is 3000+3000=600010.
Partition Tuning: Each PN has 3000 pages to read and sends the tuples in pages to the target bucket pairs according to the Bin Packing Algorithm (BPA). Thus, the read cost is 3000 IOs. BPA evenly gives the pages to each PN, it writes its share of 3000 pages to its local disks. The write cost, therefore, is 3000 pages. The total computation cost for this phase is 3000+3000=6000 IOs.
Bucket Tuning Phase: For efficient join of each bucket pair, one of the two buckets should fit in the computer memory. This requires scanning the other bucket only once to complete the join operation. For buckets that are significantly smaller than the memory capacity, this phase combines them to create a bigger combined bucket to better fit the memory capacity. This is just a decision making process without really loading these buckets into memory and writing them back to disk as a single bigger combined bucket. This phase, therefore, incurs only negligible computation and no IO.
Join Phase: Each PN has 3000 pages of data to do the local join operations. Each of these local joins requires loading each of its buckets only once. The read time for doing all the local joins at a given PN is the cost of reading all the local bucket pairs, or 3000 pages. The read cost for this phase, therefore, is 3000 IOs. We neglect the write cost for this phase in this analysis since we do not know how the join result is like. When we compare the different join techniques in Questions (1),(2), and (3), as long as we ignore this write cost for all three cases, the comparison is still valid to determine which one of the three is the better option for the join operation.
TOTAL COMPUTATION TIME: Since the computation of one phase cannot start until every PN finishes the last phase, the total computation cost of TIJ is the sum of the computation of each of the four phases: 6000+6000+0+3000=15,000IOs.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases Illuminated

Authors: Catherine M Ricardo, Susan D Urban

3rd Edition

1284056945, 9781284056945

More Books

Students also viewed these Databases questions

Question

If du, what is d 2 g / dx 2 ? 8(x)=u + 2

Answered: 1 week ago

Question

Design a cross-cultural preparation program. page 313

Answered: 1 week ago

Question

Evaluate employees readiness for training. page 289

Answered: 1 week ago