Question

1 Approved Answer

Posted on Jul 29, 2024

The following exercises concern the relational schema R = ABCDEG and instance r of this schema depicted in Table 1 below. ABCDEG a1 b1 c1

The following exercises concern the relational schema R = ABCDEG and instance r of this schema depicted in Table 1 below. ABCDEG a1 b1 c1 d1 e1 g1 a2 b2 c1 d2 e2 g2 a2 b2 c1 d2 e2 g3 a3 b3 c3 d2 e3 g4 Table 1: Instance r of relational schema R = ABCDEG. One will notice that this instance r adheres to the set F of functional dependencies (Slide 7.13 of the slides for chapter 7) depicted in Table 2 below. AB CD BD DE B DEG AB AC DE Table 2: Set F of functional dependencies on schema R = ABCDEG. We have the following four decompositions of schema R, depicted in Table 3 below. BD BE ACDE ACG (d) R = ABCDEG. BD ABC ABE ABG ACDE ABC ABC CDE ABG ACG (a) (b) (c) Table 3: Four decompositions of the schema 1 All four decompositions of Table 3 are in Boyce-Codd Normal Form (BCNF) with respect to F of Table 2, that is, each component (schema) of the decomposition is in BCNF. However, one of these decompositions is lossy, in the case of instance r (Table 1) of schema R, hence it could not have been obtained using the BCNF decomposition algorithm (see Algorithm 1). The other three decompositions can be obtained using the BCNF decomposition algorithm. The goal of is to demonstrate, using SQL, which one of these decompositions of Table 3 is lossy, and then for the remaining decompositions, to show how they were obtained using the BCNF decomposition algorithm (Algorithm 1) see Section 4 (Exercises) below for precise instructions. We now give some background and ideas on how to do these exercises by 1 demonstrating that decomposition (a) of Table 3 is lossless, and how to do this using SQL (showing that it is lossy is then very similar) in Section 1 below; giving some basic background on BCNF, along with the BCNF decomposition Algorithm 1 in Section 2 below; and showing the steps taken by the BCNF decomposition algorithm (Algorithm 1), with respect to F (Table 2) to obtain decomposition (a) of Table 3 in Section 3 below. Demonstrating losslessness and lossiness Recall that a decomposition R1, R2, . . . , Rk of relational schema R is lossless in the case of some instance r of schema R (Slide 7.8) if R1(r) R2(r) Rk(r) = r, and is lossy otherwise. For example, the first decomposition (a) of Table 3 is lossless in the case of the instance r (Table 1) of relational schema R = ABCDEG, since BD(r) ABC(r) ABE(r) ABG(r) = r . (1) Since this can be tedious to verify by hand, we can code instance r of Table 1 in SQL, as depicted in Figure 1 below. create table r ( A char(2), B char(2), C char(2), D char(2), E char(2), G char (2)); insert into r values (a1,b1,c1,d1,e1,g1), (a2,b2,c1,d2,e2,g2), (a2,b2,c1,d2,e2,g3), (a3,b3,c3,d2,e3,g4); Figure 1: Representing instance r of Table 1 in SQL. We can then run the appropriate SQL query to mimic Equation 1 in order to demonstrate that decom- position (a) is lossless. Figure 2 below provides the idea on how to structure this query. 2 with r1 as (select distinct B,D from r), r2 as ... ... select A,B,C,D,E,G from r1 ... ; Figure 2: How to structure an SQL query to demonstrate that decomposition (a) of Table 3 is lossless. By correctly filling in the missing parts (left as an exercise, see Section 4) of the query of Figure 2, we will see that the result is the same as r. Had the decomposition been lossy, then we would see that the result is not the same as r, rather it would have more rows. Note the use of distinct, since we want to mimic precisely the projection () operator in Equation 1, which eliminates duplicate rows. Mimicking the join () operator in Equation 1 is then a matter of choosing the appropriate version of SQL join operator (see Chapter 4) in the from clause above. 2 BCNF and the BCNF decomposition algorithm In addition to being lossless, decomposition (a) of Table 3 can be obtained using the BCNF decomposition algorithm (Algorithm 1). Recall that a relational schema R is in BCNF with respect to a set F of functional dependencies (Slide 7.23), if for all functional dependencies in the closure F+ (Slides 7.387.42) of the form , where , R, either: istrivial,i.e.,;or

isasuperkeyforR,i.e.,R+. If schema R is not in BCNF, then there must be some non-trivial functional dependency in F+, where , R, such that is not a superkey for R. If so, then we can decompose R (Slide 7.25), based on this violating functional dependency , into: R1 = ; and R2 =R(). Should R1 or R2 not be in BCNF, we can repeat the above step on R1 or R2, and then again (recursively) on the components resulting from this step, etc., until all components are in BCNF. This constitutes the following Algorithm 1 for decomposing a relational schema R into BCNF with respect to F. Algorithm 1: BCNF decomposition algorithm: an algorithm for decomposing relational schema R with respect to set F of functional dependencies into BCNF. result := {R}; while there is a schema Ri in result that is not in BCNF do letF+,where,Ri,suchthatandRi +; result:=(resultRi)(Ri ())(); Note that Algorithm 1 is guaranteed to halt, and is guaranteed to produce a BCNF decomposition. However, this algorithm can take many different paths, based on the different choices of violating at each step, producing many different possible BCNF decompositions. Table 3 contains three such decompositions (the remaining one being lossy, hence not obtained by this algorithm). 3 3 Producing BCNF decomposition (a) of Table 3 Let us now take the appropriate steps of this decomposition algorithm to produce decomposition (a) of Table 3 from schema R = ABCDEG according to the set F of functional dependencies of Table 2. Note that Figure 3 provides a nice visual summary of the steps taken, in the form of a tree. Schema R = ABCDEG is not in BCNF, because AB CD is a violating functional dependency: it is non-trivial (CD AB), and AB is not a superkey for R. To see why AB is not a superkey for R, we can compute the closure AB+ of attribute set AB (Slides 7.437.45) by applying Algorithm 2, and see that R = ABCDEG AB+. Algorithm 2: Algorithm to compute +, the closure of under F (Slide 7.43). result := ; while result changes do foreach functional dependency in F do if result then result := result ; By applying Algorithm 2 on AB with respect to F (Table 2), we take the following steps: step result start AB AB CD ABCD AC DE ABCDE to conclude that AB+ = ABCDE R, which is not the full R (G is missing), hence AB is not a superkey for R. We then take a step of the BCNF decomposition algorithm to decompose R (Slide 7.25), based on violating functional dependency AB CD, into: R1 = AB CD = ABCD; and R2 =R(CDAB)=ABCDEGCD=ABEG. Component R1 = ABCD is not in BCNF because of B D, since B,D R1, B D is non-trivial (D B), and B+ = BD, by applying Algorithm 2 as follows: step result start B BD BD Since BD R1 = ABCD, it is a violating functional dependency, and so we can decompose R1, based on B D, into: R11 = B D = BD; and R12 = R1 (D B) = ABCD D = ABC. Component R2 = ABEG is not in BCNF because of AB E. We know that AB E is in F+ because of the closure AB+ = ABCDE from above. Since E ABCDE and AB ABCDE, then AB E (Decomposition rule of Slide 7.41, or Slide 7.45). Since AB,E R2 = ABEG, AB E is non-trivial (E AB), and AB is not a superkey for R2. To see why AB is not a superkey for R2, we can apply Algorithm 2, to get AB+ = ABCDE. We then restrict (project, actually) this AB+ = ABCDE down to just the attributes of R2 to obtain ABE ABEG = R2, which is not the full R2 = ABEG (again, G 4 is missing), hence it is a violating functional dependency. So we can decompose R2, based on AB E, into: R21 = AB E = ABE; and R22 = R2 (E AB) = ABEG E = ABG. We now have R11 = BD R12 = ABC R21 = ABE R22 = ABG which is the decomposition (a) of

Table 3. A nice way to summarize, or keep track of the (recursive) steps being taken during the BCNF decompo- sition is with a tree such as the one depicted in Figure 3. R = ABCDEG AB CD R1 = ABCD B D R2 = ABEG AB E AB CD R (CD AB) BD R11 =BD R1 (DB) R12 =ABC ABE R21 =ABE R2 (EAB) R22 =ABG Figure 3: Tree structure to represent the steps taken by the BCNF decomposition Algorithm 1 to decom- pose R = ABCDEG into decomposition (a) of Table 3, based on the set F of functional dependencies of Table 2. Here, for each node R of the tree, the violating functional dependency appears below it, and then the left branch produces the component (R1) obtained by , while the right branch pro- duces the component (R2) obtained by R ( ). The leaf nodes do not have any violating functional dependencies (each of them is in BCNF). 5 4 Exercises Note that these exercises concern only the three decompositions (b), (c) and (d) of Table 3, since everything for (a) has been shown it serves as an example. 1. Determine which of the three decompositions (bd) of Table 3 is lossy (5 points). 2. Demonstrate that this decomposition is lossy by providing the appropriate SQL query, analogous to that of Figure 2 (10 points). Note that the code of Figure 1 (r ddl.sql) has been provided as an attachment. 3. Provide the table returned by your query of (2.), which demonstrates that this decomposition is lossy (5 points). For the other two remaining decompositions among (bd) of Table 3, which are not lossy, let us call them X and Y. 4. Show the steps taken by the BCNF decomposition Algorithm 1 to obtain decomposition X, i.e., the violating functional dependency at each step, and the intermediate components (e.g., R1, R2) generated as a result. Since this is basically everything depicted in Figure 3, a tree like this is an acceptable solution (20 points). Note that you can draw this tree on paper and scan or take a picture. 5. Justify each of the steps that are taken by the BCNF decomposition algorithm to produce decom- position X (20 points). This involves minding the following details. (i.) If the violating functional dependency is not in F, you need to show that it is in F+, as was done with AB E in Section 3 above. (ii.) You need to show why a given violating functional dependency is violating in any given step. That is, suppose that in a given step, the component (schema) is Ri: then you need to show that , Ri; that is not trivial, and finally; that is not a superkey for Ri. This last one involves computing the closure +, and showing that Ri +, as was done with AB+ and B+ in Section 3, above. Do the same as above for Y, labeling these solutions 6. (20 points), and 7. (20 points). Note, that while you would provide each BCNF decomposition as if it was moving in a forward direction to produce the final output, e.g., X, that it is best to work backwards. Since each step of the BCNF decomposition algorithm produces a pair of components, find the pairs of your decomposition X that fit together, and go from there. This may involve a bit of investigation, trial and error. Use the example of Section 3 as a guide, that is what it is there for. Then, when you think you have the steps, see if everything makes sense in the forward direction, and then provide this as the solution. Ruling out which decomposition of Table 3 is useful in this endeavor (which is why it is done first), since attempting to reverse engineer the lossy decomposition in this way will lead only to dead ends. 6