Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please do not attempt this question if you are going to copy and paste nonsense from the internet. Any solutions that does not answer the

Please do not attempt this question if you are going to copy and paste nonsense from the internet. Any solutions that does not answer the question WILL be reported!

Question: Database Systems Question: Please answer ALL parts of the question with FULL explanations Problem...

Database Systems Question: Please answer ALL parts of the question with FULL explanations. Please specify which part you are answering

Problem 3: Double Buffering with IO

This problem explores an optimization often referred to as double buffering, which we'll use to speed up the external merge sort algorithm.

Recall that sequential IO (i.e. involving reading from / writing to consecutive pages) is generally much faster that random access IO (any reading / writing that is not sequential). Additionally, on newer memory technologies like SSD reading data can be faster than writing data.

In other words, for example, if we read 4 consecutive pages from file A, this should be much faster than reading 1 page from A, then 1 page from file B, then the next page from A.

Assume that 3/4 sequential READS are "free", i.e. the total cost of 4 sequential reads is 1 IO. We will also assume that the writes are always twice as expensive as a read. Sequential writes are never free, therefore the cost of N writes is always 2N.

NO REPACKING: Consider the external merge sort algorithm using the basic optimizations but do not use the repacking optimization

ONE BUFFER PAGE RESERVED FOR OUTPUT: Assume we use one page for output in a merge, e.g. a B-way merge would require B+1 buffer pages

REMEMBER TO ROUND: Take ceilings (i.e. rounding up to nearest integer values) into account in this problem for full credit! Note that we have sometimes omitted these (for simplicity) in lecture

Consider worst case cost: In other words, if 2 reads could happen to be sequential, but in general might not be, consider these random IO

Consider a modification of the external merge sort algorithm where reads are always read in 4-page chunks (i.e. 4 pages sequentially at a time) so as to take advantage of sequential reads. Calculate the cost of performing the external merge sort for a setup having B + 1 = 20 buffer pages and an unsorted input file with 160 pages.

Show the steps of your work and make sure to explain your reasoning by writing them as python comments above the final answers.

a) Give the exact IO cost of spliting and sorting the files? As is standard we want runs of size B + 1

b) How many passes of merging are required?

c) What is the IO cost of the first pass of merging? Note: the highest arity merge should always be used.

d) What is the total IO cost of running this external merge sort algorithm? Do not forget to add in the remaining passes (if any) of merging.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_step_2

Step: 3

blur-text-image_step3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions