Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will calculate the cost of performing the modified

This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will ONE BUFFER PAGE RESERVED FOR OUTPUT: Assume we use one page for output in a merge, e.g. a B-way merge would We'll calculate the 10 cost for each merge phase and compute the total cost as the product of the cost of 

This problem explores a variant of the external merge sort algorithm. Below in the subproblems, you will calculate the cost of performing the modified external merge sort. Recall that sequential IO (i.e. involving reading from / writing to consecutive pages) is generally much faster that random access 10 (any reading / writing that is not sequential). Additionally, on newer memory technologies like SSD reading data can be faster than writing data (if you want to read more about SSD access patterns look here). For example, if we read 8 consecutive pages from file A, this should be much faster than reading 2 pages from A, then 4 pages from file B, then 2 pages from A. In this problem, we will begin to model this, by assuming that 8 sequential READS are "free", i.e. the total cost of 8 sequential reads is 1 10. We will also assume that the writes are always twice as expensive as a read. Sequential writes are never free, therefore the cost of N writes is always 2N. Please note the following: NO REPACKING: Consider the external merge sort algorithm using the basic optimizations we present in class, but do not use the repacking optimization covered in class. ONE BUFFER PAGE RESERVED FOR OUTPUT: Assume we use one page for output in a merge, e.g. a B-way merge would require B+1 buffer pages. REMEMBER TO ROUND: Take ceilings (i.e. rounding up to nearest integer values) into account in this problem. Note that we have sometimes omitted these (for simplicity) in lecture. Consider worst case cost: In other words, if 2 reads could happen to be sequential, but in general might not be, consider these random 10. Question 2.1 Consider a modification of the external merge sort algorithm where reads are always read in 8-page chunks (i.e. 8 pages sequentially at a time) so as to take advantage of sequential reads. Calculate the cost of performing the external merge sort for a setup having B+1=40 buffer pages and an unsorted input file with 320 pages. Show the steps of your work and make sure to explain your reasoning if necessary. Question 2.1.1 What is the exact 10 cost of splitting and sorting the files? As is standard we want runs of size B+1. Question 2.1.2 After the file is split and sorted, we can merge n runs into 1 using the merge process. What is the largest n we could have, given reads are always read in 8-page chunks? Note: this is known as the arity of the merge. Question 2.1.3 How many passes of merging are required? Question 2.1.4 What is the total 10 cost of running this external merge sort algorithm? Do not forget to add in the remaining passes (if any) of merging. Question 2.2 Now, we'll generalize the reasoning above by writing formula that compute the approximate cost of performing this version of external merge sort for a setup having B+1 buffer pages, a file with N pages, and where we now read in P-page chunks (replacing our fixed 8 page chunks in the previous section. *Note: our approximation will be a small one- for simplicity, we'll assume that each pass of the merge phase has the same 10 cost. We'll calculate the 10 cost for each merge phase and compute the total cost as the product of the cost of reading in and writing out all the data (which we do each pass), and the number of passes we'll have to do. Even though this is an approximation, make sure to take care of floor / ceiling operations- i.e. rounding down / up to integer values properly! Importantly, to simplify your calculations, you can assume: (B+1) %P == 0 (i.e. the buffer size is divisible by the chunk size) N% (B + 1) == 0 (i.e. the file size is divisible by the buffer size) Question 2.2.1 First, write the formula that computes the exact total 10 cost to create the initial runs in terms of B, N, and P. Question 2.2.2 Next, write the formula that computes the approximate total 10 cost to read in and then write out all the data during one pass of the merge in terms of B, N, and P. Question 2.2.3 Next, write the formula that computes the exact total number of passes we'll need to do in terms of B, N, and P. Question 2.2.4 Finally, write the formula that computes the total cost in terms of B, N, and P.

Step by Step Solution

3.56 Rating (167 Votes )

There are 3 Steps involved in it

Step: 1

21 Here are the steps to calculate the cost of performing the modified external merge sort for a setup having B140 buffer pages and an unsorted input file with 320 pages Step 1 Determine the number of ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Building Java Programs A Back To Basics Approach

Authors: Stuart Reges, Marty Stepp

5th Edition

013547194X, 978-0135471944

More Books

Students also viewed these Databases questions

Question

f. How do you apply for the position?

Answered: 1 week ago