Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Task 2 : Secondary Index & Aggregation Query Processing Objective: Experimentation with Secondary Index over non - ordering non - key attribute and Aggregate Query

Task

2

: Secondary Index & Aggregation Query Processing

Objective: Experimentation with Secondary Index over non

-

ordering non

-

key attribute and

Aggregate Query Execution Planning.

Assume the relation CITIZEN

(

,

Tax

-

Code, Salary, Age

)

storing information about UK citizens' tax codes and

annual salaries. There are

r = 60, 000, 000

records. Each attribute has the same size:

128

bytes. The relation is

stored in a file sorted by the salary attribute. The block size B

= 1024

bytes and any pointer in the system has

size

= 128

bytes. The salary attribute is assumed to be uniformly distributed across tuples. There are

6, 000

distinct salary values and there are

10, 000

different tax

-

code values. The tax

-

code and age are assumed to be

statistically independent of salary. A data scientist, who has built a Secondary Index over the Tax Code

(

non

-

ordering, non

-

key attribute

),

is interested in the specific Tax Code:

' 1234 L' .

The data scientist did analyse the

distribution of the Tax Code attribute and noted the following:

Let

x

be the number of citizens with Tax Code

1234 L

per data block. That is

,

given a random block,

there are X citizens with Tax Code

1234

.

P (x 1) = 0.5,

.

.,

the probability that at least one citizen has tax code

1234 L

50 %

within a block.

Therefore, when we pick up a block at random, the probability of finding therein at least one citizen

with Tax Code

1234 L

0.5 .

If there are

b

data blocks in the file and we are asked to retrieve those citizens with Tax Code

1234

,

then

ideally, we expect to access

\frac{b}{2}

blocks

(

ideal case

) .

However, in reality, we do not know where these blocks

are! If we use the na

ve solution

(

scan the whole file

)

to retrieve all those citizens, then we need to access

b

blocks.

2 . 1

The data scientist claims that the expected cost using the Secondary Index should be between

\frac{b}{2}

and

b .

Which is the expected cost of retrieving the citizens of Tax Code

1234 L

using the Secondary Index?

How much bigger is this cost compared to the ideal case?

2 . 2

The data scientist is asked for a query processing plan for the aggregation query:

SELECT Salary, AVG

(

Age

)

FROM CITIZEN

WHERE TaxCode

=' 1234 L'

GROUP BY Salary

If the data management system devotes

100, 000

blocks of RAM

(

memory

),

.

.,

approx.

103

(

each one of

1024

bytes

)

for executing the query, and

5000

blocks for storing the results of the query, help the scientist by

providing a query execution plan. Describe your plan

(

.

.,

steps, methods, ideas

)

and report on the

corresponding expected number of block accesses of your proposed solution. How much memory would you

need to store the result of the aggregate query?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Design Application Development And Administration

Authors: Michael V. Mannino

3rd Edition

0071107010, 978-0071107013

More Books

Students also viewed these Databases questions

Question

★★★★★

Relative to managers in more monopolistic industries, are managers in more competitive industries more likely to spend their time on reducing costs or on pricing strategies?

Answered: 1 week ago

Question

★★★★★

3. Suggest some managerial measures to improve the quality of sports cars. World Auto is renowned for designing and manufacturing world-class sports cars, which feature lightweight and high...

Answered: 1 week ago

Question

★★★★★

What is the difference between technological alliances and marketing alliances? Provide examples of both types of alliances and how they can increase a firms sales.

Answered: 1 week ago

Question

★★★★★

Assume that you recently accepted a position with Frontier National Bank as an assistant loan officer. As one of your first duties, you been assigned the responsibility of evaluating a loan request...

Answered: 1 week ago

Question

★★★★★

Task 2 : Secondary Index & Aggregation Query Processing Objective: Experimentation with Secondary Index over non - ordering non - key attribute and Aggregate Query Execution Planning. Assume the...

Answered: 1 week ago

Question

★★★★★

Evaluate whether the Phillips curve can still validly resolve today's issue of unemployment and inflation and forecast unemployment and inflation. Why or why not?

Answered: 1 week ago

Question

★★★★★

Demonstrate which molecules will not form enan Homes 0 HO CI HZD -7 ) Br Br Lill Me 3 H20 Natts 14 Br Shorwe bou enantiomers, ale o are not formed for each

Answered: 1 week ago

Question

★★★★★

Streets in Bellevue are maintained by Public Works Division; "Street Maintenance." Public Works administrators are planning the workforce of this division for next year. Labor hour requirements for...

Answered: 1 week ago

Question

★★★★★

An escape game requires that playen solve various puzzles to obtain otjecis that will aid them in estaping in one such puzzle, players are given an array of integers and a set of rules to follow in...

Answered: 1 week ago

Question

★★★★★

The goal of supply chain management is to synchronize supply and demand of all of the organizations that are part of the chain. True False

Answered: 1 week ago

Question

★★★★★

2 (a) Sketch the graph of the function, highlighting the part indicated by the given interval. y= 2 arctan(x), [0, 5] Consider the following. 3 2 29 24

Answered: 1 week ago

Question

★★★★★

Assume that the banking system has total reserves of $100 billion. Assume also that required reserves are 10 percent of checking deposits and that banks hold no excess reserves and households hold no...

Answered: 1 week ago

Question

★★★★★

As shown in Figure 3, the overall labor-force participation rate of men declined between 1970 and 2000. At the same time, the labor-force participation rate of women increased sharply. This overall...

Answered: 1 week ago

Question

★★★★★

The Bureau of Labor Statistics announced that in February 2008, of all adult Americans, 145,993,000 were employed, 7,381,000 were unemployed, and 79,436,000 were not in the labor force. Use this...

Answered: 1 week ago

Previous Question Next Question