Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Task 2 : Secondary Index & Aggregation Query Processing Objective: Experimentation with Secondary Index over non - ordering non - key attribute and Aggregate Query
Task : Secondary Index & Aggregation Query Processing
Objective: Experimentation with Secondary Index over nonordering nonkey attribute and
Aggregate Query Execution Planning.
Assume the relation CITIZENID TaxCode, Salary, Age storing information about UK citizens' tax codes and
annual salaries. There are records. Each attribute has the same size: bytes. The relation is
stored in a file sorted by the salary attribute. The block size B bytes and any pointer in the system has
size bytes. The salary attribute is assumed to be uniformly distributed across tuples. There are
distinct salary values and there are different taxcode values. The taxcode and age are assumed to be
statistically independent of salary. A data scientist, who has built a Secondary Index over the Tax Code non
ordering, nonkey attribute is interested in the specific Tax Code: The data scientist did analyse the
distribution of the Tax Code attribute and noted the following:
Let be the number of citizens with Tax Code per data block. That is given a random block,
there are X citizens with Tax Code L
ie the probability that at least one citizen has tax code is within a block.
Therefore, when we pick up a block at random, the probability of finding therein at least one citizen
with Tax Code is
If there are data blocks in the file and we are asked to retrieve those citizens with Tax Code L then
ideally, we expect to access blocks ideal case However, in reality, we do not know where these blocks
are! If we use the nave solution scan the whole file to retrieve all those citizens, then we need to access
blocks.
Q The data scientist claims that the expected cost using the Secondary Index should be between
and Which is the expected cost of retrieving the citizens of Tax Code using the Secondary Index?
How much bigger is this cost compared to the ideal case?
Q The data scientist is asked for a query processing plan for the aggregation query:
SELECT Salary, AVG Age
FROM CITIZEN
WHERE TaxCode
GROUP BY Salary
If the data management system devotes blocks of RAM memory ie approx. MB each one of
bytes for executing the query, and blocks for storing the results of the query, help the scientist by
providing a query execution plan. Describe your plan eg steps, methods, ideas and report on the
corresponding expected number of block accesses of your proposed solution. How much memory would you
need to store the result of the aggregate query?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started