[Solved] Lesson 12 Quiz (Show/Explain all Work) IS

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 13, 2024

Lesson 12 Quiz (Show/Explain all Work) IST 230 Relations on Sets, Databases 1. Let A = {0, 1, 2, 3, 4, 5, 6, 7, 8}

Lesson 12 Quiz (Show/Explain all Work) IST 230 Relations on Sets, Databases 1. Let A = {0, 1, 2, 3, 4, 5, 6, 7, 8} and B = {1, 2, 3, 4, 5, 6, 7, 8}. Now let R be a binary relation R from A to B such that for all (x, y) A x B, (x, y) R if and only if y = 2x + 1. Does 1 R 3? Explain 2. In problem 1 above, is (2, 6) R? Explain 3. For the relation R in problem 1 above, write R as a set of ordered pairs: 4. Is the relation R in problem 1 above a function (explain your answer)? 5. Draw the relation from Problem 1 above as an arrow diagram (if you don't want to draw in Word, draw the diagram in PowerPoint and then copy it to Word, or draw it by hand and then scan it into Word): 6. Give the inverse of the relation R = { (x, 1), (y, 1), (x, 3), (y, 3), (z, 4) } 7. As discussed in the Lesson, express the ordered pair (7, 3) in terms of just sets (in other words, show how an ordered pair can be represented using just set notation): 8. Write the database table below as a relation (i.e., as a set of ordered ntuples). HINT: MusicianID is the Primary Key MusicianID BandNumber FirstName ExperienceLevel A 11 Alice 1 C 11 Caitlin 3 B 22 Bill 2 Larry Newcomer WorkfileContents.docx (13 at 8 each) Page 1 of 2 The Online Course Content and textbook are the ONLY sources of help that may be used for this exam Lesson 12 Quiz (Show/Explain all Work) IST 230 Relations on Sets, Databases 9. Write the relation (i.e., set of ordered n-tuples) below as a database table. The first row of your answer should be used for column headings. HINT: EmployeeID (the leftmost, 3-character field) is the Primary Key, followed by FirstName, Dept#, Location, and YearsWorked { (111, \"Alice\Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes Introduction Who, What, Where, When, Why, How Newcomer, Tree-Structured Indexes (beginning on Page 3 of this Document) Epp 10.5 Trees Epp 10.6 Rooted Trees Who, When, and Why? Trees are one of the most important and pervasive data structures in IT. Because of their prevalence \"under the hood\" in database technology, they are arguably one of the most important data structures for IT graduates to understand. What? This STUDY SHEET (see page 3) covers the basic terminology of trees. It includes numerous examples from different fields, and ends with a discussion of binary trees so important in computer science and IT. Where? You can learn about trees by studying the Instructor's Take (starting on page 3 in this document), Epp Sections 10.5 and 10.6, Explore & Have Fun (page 2 in this document), and by finding your own sources of information on B+ Trees How? Develop your skills by studying this document, checking out the web resources in Explore and Have Fun below, and by locating your own sources of information about Trees and their uses in IT. Most of the information you need is available in the Instructor's Take (page 3 below) and the textbook To KNOW List Note that all the information you need for this STUDY SHEET is available in the textbook and Instructor's Take (page 3 below) although you are strongly encouraged to find additional sources on B+ Trees (start with some of the links in Explore and Have Fun below). The main outcomes for this STUDY SHEET include mastery of the following capabilities: IST 230 Define tree Give several examples of trees Define and illustrate terminal and branch vertices Diagram and explain the structure of a B+ Tree Explain and give an example of how B+ Trees can effectively be used to randomly retrieve records by key field (see Instructor's Take on page 3 for examples) Explain and give an example of how B+ Trees can effectively be used to sequentially retrieve records in sorted order by key field (see Instructor's Take on page 3 for examples) Give specific examples of how the two operations above are essential to database processing List some important database systems that employ B+ Tree data structures Page 1 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes Explore and Have Fun Trees General Information on Trees Introduction http://en.wikipedia.org/wiki/Tree_(graph_theory) Spanning Trees http://en.wikipedia.org/wiki/Spanning_tree_(mathematics) Diagrams http://mathworld.wolfram.com/Tree.html Trees and Spanning Trees http://www.personal.kent.edu/~rmuhamma/GraphTheory/MyGraphTheory/trees.htm Spanning Trees http://www-math.cudenver.edu/~wcherowi/courses/m4408/gtln16.html Trees http://www.maths.qmw.ac.uk/~sm/DM/dm06chap10.pdf Nice graphics http://oneweb.utc.edu/~Christopher-Mawata/petersen/lesson13.htm B+ Trees Introduction http://en.wikipedia.org/wiki/B%2B_tree With Java applet http://www.seanster.com/BplusTree/BplusTree.html Use for data retrieval http://www.sci.unich.it/~acciaro/bpiutrees.pdf Random/sequential data retrieval and updating http://www.doc.ic.ac.uk/~amiri/advDB/Btrees.pdf Discussion of use for data retrieval with diagrams http://www.soe.ucsc.edu/classes/cmps180/Winter03/Lectures/B+-Trees.pdf Full discussion http://research.cs.queensu.ca/home/dalamb/courses/235/slides/btree.html PowerPoint http://www.cs.sjsu.edu/faculty/lee/cs157/class_presentation_Btree.ppt PPT Tutorial http://www.scribd.com/doc/18211/B-TREE-TUTORIAL-PPT PPT http://www.ccs.neu.edu/home/donghui/teaching/slides/database/Ch10_BTree.ppt VSAM VSAM overview http://www.wisegeek.com/what-is-vsam.htm History and general information http://en.wikipedia.org/wiki/VSAM Definition http://www.webopedia.com/TERM/V/VSAM.html VSAM and B+ Trees http://redbook.cs.berkeley.edu/redbook3/lec4.html VSAM and B+ Trees http://www2.computer.org/portal/web/csdl/doi/10.1109/69.634758 B Tree indexes and VSAM http://www.dbmonster.com/Uwe/Forum.aspx/db-theory/1399/BTree-and-its-concept Very thorough discussion of B+ Tree indexes and VSAM https://eprints.kfupm.edu.sa/71585/1/71585.pdf Summary notes on B Trees and VSAM http://sciris.shu.edu/Resources/Cs/c2113/c2113n12.htm IST 230 Page 2 of 19 IST 230 Lesson 12 STUDY SHEET Tree-Structured Indexes Instructor's Take (Required Reading) B+ Trees and Their Use in Data Update and Retrieval This material is taken from a textbook by L. Newcomer. It discusses B+ Tree based software for data update and retrieval, and illustrates how B+ Tree structures are used in modern Database Management Systems. VSAM is a B+ Tree based Indexed File Organization that has been implemented on a wide variety of hardware platforms from mainframes to PC's. It is of practical importance, and also illustrates how B+ Tree structured indexes can be used \"under the hood\" in modern DBMS software. Since you are reading a single chapter from an entire textbook, you may need to do some research to flesh out the context for what follows. The main purpose of this material is to introduce you to B+ Tree data structures and how they are used to implement indexes which facilitate both random and sequential processing of table rows by primary key in a modern DBMS. Do your best to follow the discussions and make sure you work through all the step-by-step examples so that you develop an understanding of how the data structures and algorithms work when you use tree-structured indexes in a modern database Don't worry about strange vocabulary and acronyms - the main point is to understand the data structures and algorithms First, here's some important vocabulary (this material is simplified for those readers who don't already have a database background): 1 Equivalent Database Terminology by Audience User (Common) Term Programmer Term Database Expert Term Table File Relation Row Record Tuple Column Field Attribute ACCESS METHOD - A software component of an Operating System (OS) that is responsible for organizing data on external storage devices and for moving data between external storage and computer main memory. In a Database environment, the Database Management System (DBMS) software may rely wholly on Operating System Access Methods, may use its own internal Access Methods, or may share the work between its own internal code and the OS Access Methods VSAM - An Access Method originally developed for Mainframes that has also been implemented on PC's and everything in between. Because it uses Tree-Structured Indexes it has been chosen as a real-life example for use in this STUDY SHEET IST 230 Page 3 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes PRIMARY KEY (PK) - One or more Columns (Fields) in a Table that are designated by the Database Designer and that uniquely identify each row of a table; each table has exactly one Primary Key (PK) CANDIDATE KEY (CK) -- One or more Columns (Fields) in a Table that are additionally designated by the Database Designer and that uniquely identify each row of a table; a table may have more than one Candidate Key KEY - One or more Columns (Fields) that form either a Primary or a Candidate Key for a table SEQUENTIAL ACCESS - All the Rows (Records) in an entire Table (File) are processed one at a time (one after another) in Key Sequence; Example - creating a list of all employees by Employee ID (where Employee ID is the Primary Key) RANDOM ACCESS - A single Row (Record) in a Table (File) is processed as needed (the processing of individual records will occur in unpredictable [random] order by Key); Example - looking up the home phone number for a specific employee using his/her Employee ID INDEX - A separate file that contains Key values along with storage location information for matching Rows (Records) in a corresponding Table (File) containing user data (this table/file is called a BASE TABLE). The user data in the BASE TABLE we call \"data\"; the matching location information in the INDEX we call \"Metadata\" (literally meaning \"data about data\"). A given Table (File) can have as many associated Indexes as desired o o The more indexes you create the longer it takes to Add, Change, or Delete information in the Base Table (because the relevant Index files must also be updated accordingly) o Obviously, the more Indexes you create the more storage space required for the Indexes and the Base Table Nonetheless, Indexes speed Retrieval of information from their Base Table, and DBA's (Database Administrators) are often willing to trade-off space and slower update speed to get faster retrieval FILE ORGANIZATION - the details of how data is placed on a storage device and the algorithms for retrieving, adding, changing, and deleting data. This Lesson discusses three File Organizations: o o IST 230 SEQUENTIAL - Records (rows) are written \"sequentially\" to the storage device in the order that they are received by the program; there is no metadata, so records can only be retrieved Sequentially in the order that they were initially written HASHED (DIRECT) - A Hash Function of a record's Key is used to calculate the address of the storage location where the record is to be written; the record can then be randomly retrieved by using the same Hash Function of the Key. This supports rapid random processing of single records, but when the entire table/file is processed the records/rows come off in random order with unused \"dummy\" (empty) placeholder records intermixed Page 4 of 19 Lesson 12 STUDY SHEET o IST 230 Tree-Structured Indexes INDEXED - As you will see in the following discussion, Indexed Files support both rapid random access to individual records as well as the ability to sequentially access the entire file one record at a time in Key Sequence. The basis of this magic is of course creating and maintaining Indexes (metadata) along with the actual user data With that out of the way, we're ready to dig into the material on Indexed Files. Don't be put off by the vocabulary and vendor details - you're being asked to study this to help you better understand the Data Structures and Algorithms that underlie Tree-Structured Indexes in a DBMS - so concentrate on understanding the diagrams and examples which illustrate those Data Structures and Algorithms IST 230 Page 5 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes Sequential Files and Hash Files each offer their own unique advantages and disadvantages. Sequential Files are typically kept sorted on a key field, allowing records to be efficiently processed sequentially in key sequence. Unfortunately, Sequential Files cannot be processed randomly. Hash (Direct) Files offer the capability of fast random processing by key, but are inefficient to process sequentially (which usually requires filtering \"dummy\" records and sorting the rest into Key Sequence). Indexed File Organization is an ANSI Standard file organization designed to offer the best of both worlds: 1) easy sequential access to records in key sequence, plus (2) efficient random processing of records by key. Indexed files are usually implemented with B+ Trees and this data structure is heavily used by modern database software. 7.1 General Characteristics of Indexed File Organization (ANSI Standard) Indexed Files are chosen for most business applications because they offer ease of both sequential and random processing. Sequential Processing occurs when all or most of the records (or rows) in a file (or database table) are processed in primary key sequence. Random Processing occurs when a single record (or row) is processed, and the pattern of the selections is random. For example, Sequential Processing would be used on the Employee file or table to print a phone directory listing all employees, while Random Processing would be used to update the telephone number of one specific employee. The potential for both sequential and random processing of logical records from the same database table or file is made possible by storing secondary information (in the form of a metadata index) on the disk. Although the index takes extra space beyond that required for the data records themselves, this is usually not considered a major liability (recall that hashing (or Direct Organization) also "wastes" disk space by using only an 80% - 85% packing factor (thereby \"wasting\" 15% - 20% of the allocated space), and that Sequential Files, which do not "waste" any space, do not permit random processing). The index of an Indexed File contains information which allows the operating system access method to relate a logical record's key field value to the physical disk address (CCHHR) where the record is located. Vendors differ in the details of how the index is actually implemented, and some vendors may offer more than one implementation of Indexed Files. Regardless of the details of the file structure, the principles of Indexed File operation are universal: (1) when Indexed Files are processed sequentially, the index is used to allow logical records to be retrieved in key sequence (the index must be used for sequential retrieval since the logical records are typically only in partial key sequence on the disk), and (2) when Indexed Files are processed randomly, the index is searched to quickly locate the physical disk address of the desired logical record. Since the index is accessed during sequential file processing, Indexed Files may take slightly longer to process sequentially than standard Sequential Files. Similarly, since the index must be searched for the key (and corresponding location) of each logical record to be retrieved randomly, Indexed Files may provide slower response than hashed (Direct Organization) Files (although with modern disk hardware and modern index implementations, this difference may be negligible or nonexistent. An exact comparison depends on the size of the file, the particular Index implementation and hardware, the packing factor, the hash function chosen, etc.). Since both the space requirements and access times of Indexed Files are usually satisfactory for almost all modern business applications, the flexibility of combining key sequence sequential processing with random processing makes Indexed Files the clear choice. IST 230 Page 6 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes 7.2 A Common Indexed File Implementation VSAM represents IBM's implementation of ANSI Standard Indexed File Organization. VSAM stores the data records themselves (called the Data Component) and the index records which constitute the index (called the Index Component) as separate files on the same or different disks. When viewed in combination, the Data Component and the Index Component together are called a cluster. VSAM organizes all records (including both the Index and Data Components) in terms of Control Intervals and Control Areas. A Control Interval (CI) is the physical unit of transfer between the disk and main memory. VSAM always inputs and outputs complete Control Intervals. A Control Area (CA) is a group of related Control Intervals (see Fig.7-3 and the discussion below). The size of a Control Interval and the number of Control Intervals in a Control Area are usually automatically determined by VSAM based on the type of disk being used and the size of the logical records in the Data Component. It is usually best not to override the Control Interval and Control Area sizes picked by VSAM. Control Intervals may actually consist of one or more physical records (blocks) on a disk. Again, the block size may be determined automatically by VSAM. VSAM always inputs and outputs complete Control Intervals (whether or not this involves input/output of one or more physical blocks). Organization of the Data Component The Data Component (like all VSAM entities) is organized into Control Intervals and Control Areas. A Data Component CI normally holds several logical records. The logical records within each Control Interval are always kept sorted by key field. VSAM treats all logical records as if they were variable length (even if they are in fact fixed length). VSAM keeps track of the length of logical records in a CI by using special control information fields placed at the end of each CI. Between the logical records themselves (at the beginning of the CI) and the control information (at the end of the CI), there is free space within the CI where new logical records can be added. IST 230 Page 7 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes EXAMPLE 7.1 Figure 7-1 shows the layout of a typical Control Interval, including variable length logical records, control information, and free space. New logical records may be added to the CI by using the free space area. EXAMPLE 7.2 Figure 7-2 shows the addition of a logical record to the CI in Fig.7-1. Note that the record is inserted into the CI in key sequence (thus reorganizing the CI). When a record is added to a CI, the free space within the CI is correspondingly reduced, and the control information updated accordingly. The Control Intervals in the Data Component are further grouped into Control Areas. Control Areas play an important role in both sequential and random processing of logical records, as described below. Index Component Organization -- the Sequence Set The Index Component is itself organized into two parts: the Index Set and the Sequence Set. The Sequence Set consists of the lowest level index entries, and is closely related to the concept of Control Area. There is one Sequence Set record for each Control Area in the Data Component. The Sequence Set record for a Control Area contains an entry for each Control Interval in that Control Area. The entry for a Control Interval consists of: (1) the highest key of the logical records stored within that CI, and (2) the physical disk address of the Control Interval. The CI entries in a given Sequence Set record are kept sorted in ascending sequence by key. This not only facilitates searching during random processing, but it allows the Control Intervals within a CA to be retrieved in key sequence during sequential processing -- regardless of whether the CI's are physically in key sequence within the CA. Since the records within each Control Interval are in key sequence, if the Control Intervals within a Control Area are processed in the order dictated by the Sequence Set record for that CA, the logical records within the Control Area will be retrieved in key sequence. Recall that there is a Sequence Set record for each Control Area in the Data Component, and that each Sequence Set record contains an entry for each Control Interval within its given Control Area. In order to facilitate random processing, each Sequence Set entry consists of the highest key in its Data Component Control Interval, together with a vertical pointer to the Control Interval itself (i.e., to the disk location of the CI). The vertical pointer can be followed to retrieve any or all logical records within the CI. In addition to the vertical pointers (to each CI within its associated CA), each Sequence Set record also contains a horizontal pointer to the next Sequence Set record in key sequence. The horizontal pointers are followed during sequential processing: after all logical records in a Control Area have been retrieved; the horizontal pointer is used to locate the Sequence Set record for the next Control Area to be processed. IST 230 Page 8 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes EXAMPLE 7.3 Figure 7-3 shows the relationship between sequence set records and Data Component Control Intervals and Control Areas. The vertical pointers are used to locate CI's during both sequential and random processing. The horizontal pointers are used during sequential processing only. Note that CI's are not necessarily arranged in key sequence within their CA. This is why sequential processing must be done by following the Sequence Set entries (which are maintained in key sequence). Index Component Organization -- the Index Set The second part of the Index Component is the Index Set. The Index Set consists of all higher level index entries and is used only during random processing. The purpose of the Index Set is to allow rapid location of the Data Component Control Interval which holds the desired logical record. The VSAM Index Set is organized as a tree or hierarchical structure. There is one and only one Index Set record at the top of the tree (i.e., at the root). Index searching during random processing always begins at this index record. The root and all other Index Set records consist of several entries. Each entry consists of the highest key represented in the next lower Index Set record, together with a pointer giving the disk location of said index record. The individual entries within an Index Set record are kept sorted in ascending sequence by key. During random processing, each logical record to be accessed must first be looked up in the index. This process proceeds as follows: (1) the root index record is input and the first entry with a key greater than or equal to the key of the desired record is located. Associated with this key value will be a downward pointer to a next lower level index record; (2) the next lower level index record pointed to in step (1) is input and the first entry with a key greater than or equal to the key of the desired record is located. Associated with this key value will be a downward pointer to a next lower level index record; (3) this IST 230 Page 9 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes process is repeated until the index record retrieved is a Sequence Set record. At this point, the first Sequence Set entry with a key greater than or equal to the key of the desired record is located. Associated with it will be a pointer to a Data Component Control Interval. Since each entry contains the highest key in its associated CI, if the desired record is in the file at all, it must be in the indicated Control Interval; (4) the indicated Data Component CI is input and searched for the desired logical record. If the record is not in this CI, it is not in the file. The number of entries (and corresponding pointers to lower level records) in an index record is known as its fan- out. It is clear that the greater the number of entries in an index record (i.e., the higher the fan-out), the fewer the number of index records which must be examined during an index search. VSAM automatically increases the number of index entries per index record by compressing the key values in the index entries. Key compression is a feature provided to limit the size of the index and thereby speed index searching. It is totally handled by VSAM and is completely transparent to the programmer. The other factor that governs the speed of the index search is the number of index records that can be held (and therefore searched) in main memory, as opposed to being input from disk. If the file is small enough, the entire Index Component may fit in main memory, resulting in optimal random processing speed. For a large file, only a relatively small portion of the index may fit in main memory, resulting in many disk accesses during index searching and a corresponding slow-down in random processing. EXAMPLE 7.4 Figure 7-4 shows a VSAM Indexed Cluster, including the Index Component (made up of the Index Set and Sequence Set) and the Data Component. The cluster is depicted as it would be immediately following creation. Observe that half of each CI in the Data Component has been deliberately left empty, and that one CI within each CA has also been left empty. This distributed free space is available to accommodate later additions to the file. IST 230 Page 10 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes Initially all CI's within each CA are in key sequence, as are the CA's themselves. This may change as records are added to the file later on, but it will not disrupt file processing because the Sequence Set entries are kept in key sequence regardless of what happens with respect to the CI's themselves. Of course the logical records within each CI are always maintained in key sequence. 7.4 Random Retrieval of VSAM Indexed Files When VSAM Indexed files are accessed randomly, each READ statement results in a full search of the Index Component for the cluster. The key value for the logical record to be retrieved is first placed in the record key field associated with the file. A READ statement is then executed, causing the VSAM access method to begin its index search at the highest level (i.e., root) record in the Index Set. VSAM first compares the record key value specified by the program to the entries in the top index record. The first entry in the root record greater than or equal to the record key is used to point to a next lower level Index Set record, which is now input (if it is not already in memory). The entries in this index record are then compared to the record key value. Again, the first entry greater than or equal to the record key points to a next lower Index Set record which is now input (if it is not already in memory). This search process continues downward through the index levels until eventually an Index Set record entry points to a Sequence Set record. The first Sequence Set record entry which is greater than or equal to the record key value points to a Data Component Control Interval which must hold the record (if it is in the file). This CI is now retrieved and searched for the desired logical record. If the record is not found, the program is notified of the record not found condition. EXAMPLE 7.6 VSAM carries out the following steps to randomly locate the logical record with key 90 in the file of Fig. 7-4: (1) The program places the key value (90) in the record key area for the file and then executes a READ statement. (2) Execution of the READ statement causes VSAM to pick up the record key value from the program. (3) VSAM now retrieves the top-level record from the Index Set (no disk access is needed if it is already in memory; when the file was opened, VSAM automatically input as much of the Index as possible into memory to minimize time required for disk access during Index searching). (4) The entries in the top level Index record are compared to the record key value. Since 120 is the first entry greater than or equal to 90, the pointer (i.e., disk address) associated with key value 120 is selected as the address of the next Index record to be retrieved. (5) If it is not already in memory, the Index record pointed to in step (4) is input from disk. (6) The Index record from step (5) is searched for the first entry greater than or equal to 90: 60 is skipped since it is less than 90, and the search again selects the entry associated with key 120. The pointer for this entry points to a Sequence Set record. (7) The Sequence Set record pointed to by step (6) is input (if it is not already in memory). IST 230 Page 11 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes (8) The Sequence Set record is searched for the first key greater than or equal to 90, which in this case is the entry corresponding to key 100. This entry contains the address of a Data Component Control Interval. (9) The Data Component CI pointed to in step (8) is now input from disk. (10) The CI input in step (9) is searched for the logical record with key 90, and this record is placed in the record area for the file (where it can be processed by the program). If no record with key 90 exists within this CI, the program is notified of the record not found condition. 7.5 Random Maintenance of Indexed Files -- Adding Records A program can add logical records to VSAM files by: (1) moving the key of the record to be added to the record key area, (2) filling in the rest of the fields in the record, and then (3) using WRITE to add the record to the file. In response to the WRITE statement, VSAM goes through a full Index search to locate the Data Component CI in which the new record should be placed. This search is exactly the same as that used to randomly retrieve a record. Once the correct CI is located, the record is added utilizing the free space within the CI. EXAMPLE 7.7 VSAM carries out the following steps to add a logical record with key 95 to the file in Fig. 7-4: (1) - (9) are the same as in Example 7.6. (10) After the index search locates the Data Component CI to which the new record should be added, that CI is input into memory. VSAM then searches through the logical records in the CI to determine where the new record should go (recall that VSAM always keeps logical records within a CI in key sequence). In this case, the new record should go between records 90 and 100. (11) The new record is then inserted in the CI in key sequence, with other records being rearranged as necessary. (12) The Control Information within the CI is updated to reflect the facts that: (a) the available space is reduced by the length of the record just added, and (b) there is a new logical record of a given length within the CI. (13) The updated CI is rewritten back to its original location on the disk. Note that until this step is completed, the file itself has not been updated. Figure 7-5 shows the Data Component CI in question after the addition of a logical record with key 95. Note that the free space left in the CI has been reduced by the length of the new record. IST 230 Page 12 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes 90 95 100 Free Figure 7-5 EXAMPLE 7.8 The random addition of a logical record with key 85 to the file in Example 7.7 would proceed as follows: (1) The root index record is searched and it is found that the entry for 120 is the first entry not less than 85. (2) The pointer for this entry is followed and the next lower index record is retrieved; the entry for 120 is again the first entry not less than 85. (3) The pointer for this entry is followed and the Sequence Set record pointed to is retrieved; a search of the Sequence Set entries discovers that entry 100 is the first entry not less than 85. (4) The pointer for this entry is used to input the Data Component CI to which record 85 should be added. The CI is rearranged as record 85 is inserted in key sequence. The Control Information at the end of the CI is updated to reflect the changes within the CI (note that there is no free space remaining). (5) The modified CI is rewritten back to its original location on disk. Figure 7-6 shows the contents of the affected CI after the addition of logical record 85. 85 90 95 100 Figure 7-6 Control Interval Splits How does VSAM handle additions to a CI when there is not enough free space within the CI to accommodate the new record? In this case VSAM automatically attempts to perform a Control Interval split. Splitting a Control Interval requires a totally empty CI within the corresponding Control Area. If such an empty CI exists, about half of the logical records from the full CI are moved to the empty CI. This CI split leaves the original CI only about half full, so that after the split there is plenty of room in which to add the new record. Note that the CI which was originally empty is now also only about half full, thus having space for the addition of more records. IST 230 Page 13 of 19 IST 230 Lesson 12 STUDY SHEET Tree-Structured Indexes During a CI split, VSAM must also update the Sequence Set record for the Control Area involved. In particular, the entries for the CI which was originally full and the CI which was originally empty must be changed to reflect the new status of these CI's after the split. EXAMPLE 7.9 The addition of a record with key 87 to the file in Example 7.8 causes a Control Interval split as follows: (1) The full index search reveals that the new record should be added to the CI containing records 85, 90, 95, and 100. As Fig. 7-6 shows, this CI is full and cannot accommodate the new record. (2) VSAM automatically looks for an empty CI within the same CA. The Sequence Set record for the CA readily indicates the presence of an empty CI. (3) About half the logical records are copied from the full CI to the empty CI, as follows: Full 85 90 95 100 Originally Empty free free free free After CI Split 1/2 Full 1/2 Full 85 95 90 100 free free free free (4) The new record with key 87 can now be inserted into the proper CI utilizing the free space created by the CI split. Note that the CI split has improved the situation within the file for adding new records: where formerly there was a full CI to which no new records could be added, now all CI's have at least some free space for adding new records. A CI split reduces the probability of another CI split. This is an advantage of the VSAM design. (5) The Sequence Set entries for this CA must be updated to reflect the new status of the CI's. Note that the CI's within the CA are no longer in key sequence with respect to one another after a CI split has occurred. If records are to be retrieved sequentially in key sequence, this must be done by following the Sequence Set entries (which are kept in key sequence), rather than physically following the Data Component CI's (which are no longer in key sequence). Figure 7-7 shows the file in Figs. 7-4 through 7-6 after the CI split described above. The third Control Area in Fig. 7-7 has also been modified by the addition of records with keys 132, 134, 136, and 138. IST 230 Page 14 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes Control Area Splits If VSAM is attempting to perform a CI split and cannot locate an empty CI within the affected CA, then a Control Area split may occur. When VSAM cannot find an empty CI with which to carry out a CI split, VSAM will automatically create a new Control Area for the Data Component (plus its associated index records in the Index Component). This new Control Area is initially completely empty. Following the new allocation, about half the Control Intervals from the CA which has no empty CI are copied to the new CA just created. This leaves each CA with about half its Control Intervals empty. Both CA's can now support any Control Interval splits which are required (since they both have plenty of empty CI's). Again, the VSAM design is such that performing a Control Area split actually reduces the probability of another CA split. Recall that it is the need to perform a CI split (which cannot be carried out without an empty CI) which causes VSAM to do a CA split. After the CA split is completed, the original CI split can be carried out successfully. IST 230 Page 15 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes EXAMPLE 7.10 Figure 7-8 shows the file in Fig. 7-7 after the addition of a logical record with key 135. The events involved in adding this record to the file are detailed below. This Example also clarifies the relationship between VSAM (which is part of the Data Management portion of the Operating System), and the program. The functions carried out by VSAM are transparent to the program: Functions Carried Out by Program: (1) The program moves the key of the record to be added (135) to the record key area. (2) The program then builds the rest of the record in the record area. (3) Finally, the program executes a WRITE statement. Functions Automatically Carried Out by VSAM in Response to WRITE: (4) VSAM picks up the key of the record to be added from the record key area. (5) Using the record key value, VSAM conducts a full index search (as in Example 7.6) to determine in which Data Component Control Interval the new record should be placed. (6) Having obtained the CI in which record 135 should be placed, VSAM examines the Control Information Field at the end of the CI to determine whether there is space to add the record. In this case, the CI is already full and no space is available (see Fig. 7-7). (7) Since sufficient space to add the record is not available, VSAM attempts to perform a Control Interval split. This involves examining the Sequence Set record for the CI (already in memory because of step (5)) to locate an empty CI within the same Control Area. (8) Since there are no empty CI's within the same Control Area, VSAM automatically attempts a Control Area split. This involves obtaining additional file space and creating a brand new IST 230 Page 16 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes Control Area together with its associated Index records. Note that creating a new CA not only affects the Sequence Set (which must have a new record describing the new CA), but also the entries in the Index Set. Conceivably, the new CA might generate modifications to Index records all the way back to the top level Index Set record (root). VSAM handles all necessary Index maintenance automatically. If space to create a new CA is not available, VSAM will report to the program that the record cannot be added. (9) Assuming that a new CA is successfully created, VSAM now copies about half the Control Intervals from the CA identified in step (7) to the new CA created in step (8), and the Index Component is updated accordingly. (10) This leaves the original CA with about half its CI's empty, so a CI split can now be performed. About half the logical records are copied from the full CI to an empty CI and the Sequence Set entries updated accordingly. Observe that whereas a CA split involves updating both the Sequence Set and the Index Set, a CI split involves changes to the affected Sequence Set record only. (11) After the CI split has been carried out, there is free space available within the original CI to add the record with key 135. Record 135 is inserted in key sequence, the Control Information Field is updated, and the CI is rewritten back to disk. This leaves the file as shown in Fig. 7-8. Note that performing a Control Area split (and to a lesser extent a Control Interval split) entails a great deal of disk access (many Index Component and Data Component physical records are input, modified, and rewritten), and thus is a time consuming operation. However, every time a CA or CI split is carried out, the probability of having to perform another CA or CI split goes down. Thus CA (and CI) splits are performed relatively infrequently as records are added to the file. CI splits will of course be necessary more often than CA splits. CA splits also add Index records to the Index Component, thus resulting in a slightly bigger Index. However, it is important to recognize that the original structure of both the Data Component and the Index Component is maintained by both Control Area and Control Interval splits. VSAM random file processing is only slightly degraded even if many records are added to the file. 7.8 Sequential Retrieval of Indexed Files When VSAM Indexed files are processed sequentially, VSAM automatically retrieves logical records in key sequence. In order to accomplish this, VSAM must process Data Component CI's in the order indicated by the Sequence Set entries. During sequential processing, VSAM does the following: (1) Inputs the first Sequence Set record from the Index Component. (2) Inputs the Data Component CI whose address is given in the first Sequence Set entry of the current Sequence Set record. (3) For each READ statement executed by the program, places the next logical record from the CI into the record area for the file. (4) When all logical records in the CI have been given to the program, inputs the Data Component CI pointed to by the next Sequence Set entry in the current Sequence Set record. IST 230 Page 17 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes (5) Gives the logical records within this CI to the program one at a time (as requested by the execution of READ statements). (6) Repeats steps (4) and (5) until all CI's associated with the current Sequence Set record have been processed. At this time, VSAM follows the horizontal pointer in the current Sequence Set record to locate the next Sequence Set record. (7) Repeats steps (2) through (6) until all Sequence Set records (and thus all Data Component Control Areas and the Control Intervals within them) have been processed. EXAMPLE 7.12 The file in Fig. 7-7 is processed sequentially as follows: (1) When the file is opened, VSAM inputs the first Sequence Set record (with entries 20, 40, 60), then the Data Component CI pointed to by the first entry (20). (2) In response to each READ statement, the program is given the next logical record in the current CI. After two READs, records 10 and 20 will have been processed. (3) Having processed the first CI, VSAM now proceeds to the second entry in the current Sequence Set record (i.e., 40), and retrieves the corresponding CI. Records are now given to the program from this new CI (thus the third and fourth READs retrieve records 30 and 40). (4) Having processed the second CI, VSAM now proceeds to process the third CI for this Sequence Set record, resulting in the retrieval of records 50 and 60. (5) Since the fourth entry in the current Sequence Set record represents an empty CI, all CI's for this Sequence Set record have been processed. VSAM now follows the horizontal pointer to input the next Sequence Set record (with entries 80, 90, 100, 120). (6) The first entry in the new Sequence Set record is used to locate the next Data Component CI, and records 70 and 80 are retrieved. (7) Switching to the next entry in the current Sequence Set record, logical records 85, 87 and 90 are retrieved. (8) Switching to the next entry in the current Sequence Set record, logical records 95 and 100 are retrieved. Note that VSAM is not processing Data Component CI's in their physical order. The physically next CI contains logical records 110 and 120 which are not in key sequence. It is only by following the Sequence Set, which is maintained in key sequence, that VSAM can retrieve records in key sequence. (9) Switching to the last entry in the current Sequence Set record, VSAM will now (correctly) retrieve records 110 and 120. Since VSAM must work through the Sequence Set (in effect processing the CI's within each CA in a physically random sequence), VSAM sequential processing will be slightly slower than normal sequential processing which is done according to strict physical sequence. This is a small price to pay for the benefits provided by Indexed Organization. IST 230 Page 18 of 19 Lesson 12 STUDY SHEET IST 230 Tree-Structured Indexes (10) The horizontal pointer to the next Sequence Set record is now followed, and record retrieval continues as described above until the end of the file is encountered. The program is notified of the end of file condition. IST 230 Page 19 of 19 1R3 That is x 1 and y 3 is belong to relation because for x 1, y 2 1 1 3 (8) SELECT pn.name, count c.id from test.bsg _ people pl, test.bsg _ cert _ people pe, test.bsg _ cert c, test.bsg _ planets pn where pl.id pe.pid and c.id pe.cid and pn.id pl.homeworld group by pn.name having count c.id 0 (10) File organization A file organization is an arrangement of the files that contains stored records information in a well-structured in computer system the organization of files are usually refers to the logical arrangemnt There are two components in arranging the files to be looked into Internal file structure External file structure Internal file structure is considered to be a high level design it is to specify the computer software program to arrange the files in an order depending on the purpose So the designing of the file system is depending on the system enviroment The most important consideration can be Rapid access to a records or a number of records which are related to each other the adding modification or deletion of records efficiency of storage retrieval of records Redundancy being the method of ensuring data integrity Types of files organization File organization is also depending on the type of file The file of files can be text files binary files executable files images video files there are techniques of the files organization heap file organization Sorted file organization Hashed file organization Sequential The name itself specifies the type of organization that is sequentially in this file organization the order of files is arranged in the way they had entered Once they are sorreted with the file block physically in contiguous block of sequence with in the block of records reading and writting of the records to the files are done sequentially whenever a new records is created they are placed at the bottom of the file line sequential This is a single level indexing structure and one of the simplest forms for organizing this contains a pair of records contains pointers The pointers are used to the positions of the file with the specified key this uues a key search by using the index keys A linear search is used in index key searching indexing can be extended to a hierarching level of arrangement This arrangement decreases the time in searching for a file This is basically uses a sequential file organization when certain updates or new recordes are created basically indexed based organization is used for hardware like diska Indexed sequential In the hash base file organization the records are places on the disk in a random form each records is accessed by the address specified in the disk location it uses hash algorithm to calculated the address of the record this uses an additional space to store the files and uses a portion a disk space to access if any indexing is done improperly the left over space is sent into bit bucket Irrespective to this problem hash file orginization performance is excellent search retrieval 11 In programming languages, such as lisp python and linda, and others, a tuple is an ordered set of values. The separator for each value is often a comma Common uses for the tuple as a data type are 1 for passing a string of parameters from one program to another, and 2 representing a set of value attributes in a relational database. In some languages, tuples can be nested within other tuples within parentheses or brackets or other delimiters Tuples can contain a mixture of other data types. user term for a toyple is row in a table programmer ter for tuple is record, some programmers call it as tuple itself Here 's an example of a tuple that emphasizes the different data types that may exist within a tuple data type : 18,*, 2.49, eight The above example is sometimes referred to as a 4 tuple, since it contains four values. An n tuple would be one with an indeterminate or unspecified number of values. Syntactically, a tuple is a comma separated list of values : t 'a ', ' b ', 'c ', 'd ', 'e ' Although it is not necessary, it is common to enclose tuples in parentheses to help us quickly identify tuples when we look at Python code : t 'a ', ' b ', 'c ', 'd ', 'e ' To create a tuple with a single element, you have to include the final comma : 'a ', type t1 t1 type ' tuple ' Without the comma Python treats 'a ' as an expression with a string in parentheses that evaluates to a string : 'a ' type t2 t2 type 'str ' Another way to construct a tuple is the built in function tuple. With no argument, it creates an empty tuple : t tuple() print t () If the argument is a sequence string, list or tuple , the result of the call to tuple is a tuple with the elements of the sequence : t tuple 'lupins ' print t 'l ', ' u ', ' p ', 'i ', ' n ', 's ' Because tuple is the name of a constructor, you should avoid using it as a variable name. Most list operators also work on tuples. The bracket operator indexes an element : t 'a ', ' b ', 'c ', 'd ', 'e ' print t 0 'a ' And the slice operator selects a range of elements. print t 1: 3 ' b ', 'c ' But if you try to modify one of the elements of the tuple, you get an error : t 0 ' A ' TypeError : object doesn ' t support item assignment You can't modify the elements of a tuple, but you can replace one tuple with another : t ' A ', t 1: print t ' A ', ' b ', 'c ', 'd ', 'e ' Comparing tuples The comparison operators work with tuples and other sequences; Python starts by comparing the first element from each sequence. If they are equal, it goes on to the next element, and so on, until it finds elements that differ. Subsequent elements are not considered even if they are really big . 0, 1, 2 0, 3, 4 True True 0, 1, 2000000 0, 3, 4 12 Comparing random versus sequential operations is one way of assessing application efficiency in terms of disk use. Accessing data sequentially is much faster than accessing it randomly because of the way in which the disk hardware works. The seek operation which occurs when the disk head positions itself at the right disk cylinder to access data I requested, takes more time than any other part of the process. Because reading O randomly involves a higher number of seek operations than does sequential reading random reads deliver a lower rate of throughput. The same is true for random writing You might find it useful to examine your workload to determine whether it accesses data randomly or sequentially. If you find disk access is predominantly random, you might want to pay particular attention to the activities being done and monitor for the emergence of a bottleneck. Sequential access linear order Random access nonspecific order sequential access file means accessed in sequence i.e first the first element than second than third as in linked list while random access is accessing any element without accessing others as in array In Java, through Object Serialization, an Object can be transformed into a sequence of bytes and in the reverse process, Deserialization, Original Object is reproduced public void Add To Div (stringdiv) {/ / Code to read the html document and look for the div / / name specified as the parameter of this method . } try {/ / Method logic to parse document for the div } catch(ArgumentExceptionex) { / / (I wouldn ' t supress this catch block in production code, / / just omitting body details for simplicity. } Difference between binary and text file in Java Programming Language : a We directly open the text file and see the contents of its but we can ' t do with the binary file, if we do than it will shows different combination 0 's and 1'. b It in machine readable format, only machine can read it. c It is in encrypted form, so the it more secure than a text file. Comparing random versus sequential operations is one way of assessing application efficiency in terms of disk use 13 Each database has one or more administrators who are responsible for maintaining all aspects of the security policy : the security administrators. If the database system is small, then the database administrator may have the responsibilities of the security administrator. However, if the database system is large, then a special person or group of people may have responsibilities limited to those of a security administrator After deciding who will manage the security of the system, a security policy must be developed for every database. A database security policy should include several sub policies, as explained in the following sections. Database User Management Database users are the access paths to the information in a database. Therefore, tight security should be maintained for the management of database users. Depending on the size of a database system and the amount of work required to manage database users, the security administrator may be the only user with the privileges required to create, alter, or drop database users. On the other hand, there may be a number of administrators with privileges to manage database users. In any case, only trusted individuals should have the powerful privileges to administer database users. This is the administrative activity. It is responsible for managing the data resources. It is including database planning, dealing with the technical issues such as security enforcement, performance and backup and recovery associated with managing a data base. This is a technical level function responsible for physical database design, security enforcement, and database performance. It is including maintaining data dictionary, monitoring performance, and imposing organizational standards and security The following security issues must also be considered for the operating system environment executing any database applications : Database administrators must have the operating system privileges to create and delete files. Typical database users should not have the operating system privileges to create or delete files related to the database If the operating system identifies database roles for users, then the security administrators must have the operating system privileges to modify the security domain of operating system accounts Major Drawback of having one admistrative is that In the absense of the admistrative, the data required cant be obtained for the business Admistrative can misuse the data for his own greed one admistrative mean higher risk of the data base safety . and drawbacks for having several administrators Disputes arrises. Technically speaking.. unnecessary wastage of man power and cost which should be paid in terms of salary Difficulty in making final decisions as all won ' t be agreeing on one idea