Question: Index a database of student records using B-trees. There will be two index files as the database will be indexed both by student ID and

Index a database of student records using B-trees. There will be two index files as the database will be indexed both by student ID and student last name. A separate database file will contain the actual student records. The program will take two input files: a data file and a command file. The data file is a text file containing student records from which the database and its indexes (binary files) will be initially built. The command file is a list of insert and search operations to be carried out after the files have been built. The results of these operations will be sent to an output file. A name will be provided (through the command line) that is used to obtain the names of the database and index files. For example, if the name provided is student, the database will be named student.dat while the two index files will be named student.ix1 (for the ID index) and student.ix2 (for the name index). The order (m) of the B-trees will also be specified (the same for both indexes) in the command line.

Program Invocation:

The program is invoked as follows: btree If the command line contains an insufficient number of arguments or if any of the specified input files do not exist, the program should print an appropriate error message and exit.

Input and Output Formats:

The data file will consist of some number of text lines, one student record per line. Each line will have 6 space-separated fields: ID (9 digits), last name (at most 15 letters), first name (at most 15 letters), year (1 digit), major (at most 4 letters), and email address (at most 20 characters). This means a record will require at most 64 bytes if stored as text fields. The command file will contain lines specifying one of five kinds of commands: find ID find name add dump by ID dump by name

The first two commands will output the record(s) found, one line per record. Note that the first command returns at most one record (output NOT FOUND if no such record exists), while the second command may return more than one record. (You may assume that IDs are unique and that last names may duplicate). For the third command, the program should output STUDENT ADDED. (; ; ), where is the ID associated with the record added, is the byte position in the database file where the record was added, and and are counts of splits performed during the insert operation. The count for the ID index is , while the count for the name index is. The last two commands list ALL records in their indicated order.

For all the five commands, echo the command line itself before you echo the output results described above.

For all input files, the fields are separated by one or more spaces and there may be leading or trailing spaces in a line. You may assume, of course, that the fields themselves may not contain spaces (e.g., is one word with no spaces in between). When printing out a record, print the byte position of that record in the database file, followed by a colon and 69 (ASCII) characters describing the record: each field is printed with its maximum number of characters padded with trailing spaces if necessary. Add a single space between each field64 characters for the fields plus 5 separator spaces makes 69 characters total. Finally, output the number of records in the database file and the number of B-tree nodes in the index files before and after processing the command file, using the following format: NUMBER OF RECORDS: , ID INDEX: nodes, NAME INDEX: nodes. In effect, the first line of the output file will contain the status of the files after the data file has been read, while the last line of the output file will contain the status of the files after the command file has been processed.

Data Structures: This assignment is an exercise in B-trees, indexing, and file I/O. Store fixed-length (64-byte) records for the database (store fields using ASCII characters and then pad each field with spaces up to its maximum length) and append these records to the database file in the order that the add commands appear. For the indexes, store the keys in ASCII (9 bytes for ID, 15 bytes for last name) and the file positions as C++ long values. The index file will consist of fixedsized tree nodes that contain up to m ?1 keys, m ?1 database record positions (pointers to the database file), and m index file record positions (pointers to children). You will also need to store the actual number of children for the node (some number less ? m stored as a C++ int). This means the size of an index file should be a multiple of (m ? 1)K + (2m ? 1)L + I, where K is the size of a key, L is the size of a C++ long value, and I is the size of a C++ int value. The first (m ? 1)K + (2m ? 1)L + I bytes of the index file will be the node that corresponds to the root of the Btree. When new nodes need to be added because of splits, (m ? 1)K + (2m ? 1)L + I bytes will be appended to the index file for each new node. (Note that you have a decision to make when you perform a split. For example, suppose you want to split node A to result in nodes B and C. You can either have B occupy the node previously occupied by A, and C occupies the newly appended node, or, C takes over A, and B occupies the new node. This detail is left up to you.) The order in which records are added to the database is important, so make sure that when the initial database is set up and the data file is read, it is equivalent to carrying out an add operation for each line in the data file in the indicated sequence. Since duplicate last names may exist, a rule is needed in case the name already exists in the index. Comply with the following rule: If a record is inserted into the database such that one or more record(s) with the same name already exists in the index, the new record should occur after the last record (according to the index) with that name. When splitting a node, there will be m ? 1 keys in a node and an incoming key, for a total of m keys. The split should cause the ?m2 ?th key to be promoted (assuming the first key is the 0th key and the last key is the (m ? 1)th key). There is no limit for the value of m. This means the buffers (the variables you use to store the nodes) for the files must be dynamically allocated. Use of the C++ Standard Template Library (STL) is limited; in general, you should avoid using such classes for this project, but ask your instructor during class, in case you find it reasonable to use a particular STL class, while staying within the objectives of this assignment. The point of this project is to implement the B-Tree data structure from scratch.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Hash Table Indexing Learning Objectives Implement a data structure to meet given specifications Design, implement, and use a closed hash table data structure Use a hash table as an index into a...

In this program you are to implement a database of student records, using the binary search tree abstract data type, implemented as a linked list. This must be written in C, not C++. The student...

In C++ language please Implementing Indexing Using B-Trees An index is a file that facilitates insert, delete, and search operations on a database. The efficiency of these operations depends on the...

This is all one assignment COM 301 Lab 2: Implementing Indexing Using B-Trees An index is a file that facilitates insert, delete, and search operations on a database. The efficiency of these...

Implement a software program in C++ t stores and searches the Student records using double-hashing algorithm. Double hashing uses the idea of applying a second hash function to key when a collision...

using C++ Implement a software program in either C++ stores and searches the Student records using double-hashing algorithm. Double hashing uses the idea of applying a second hash function to key...

The picture is just a hint, the code is at the bottom. Starter Code: : """ import random class Course: ''' >>> c1 = Course('CMPSC132', 'Programming in Python II', 3) >>> c2 = Course('CMPSC360',...

This is in C. For this assignment we will write a simple database server. We will be creating a simple database of student records, so lets describe these first. The format of a student record is as...

For the incorrect sample the Peak area for glucose is 254237.0 and for fructose is 258153.0 Quantify both your analytes in the "incorrect" sample using both calibration methods and calculate the %...

The line L is a tangent to the circle x 2 + y 2 = 26 at the point (5, 1). Line L crosses the x-axis at point P. Work out the coordinates of P.

The Basel Accords are international regulatory frameworks designed to: Question 7 options: Standardize consumer banking products across countries Ensure that cryptocurrency transactions are secure...

1 . 9 A furniture zompany manufactures high quality dining room furniture that is sold to major retail stores. Extracts from the budget for last year are given below: \ table [ [ ,...

=+you think is being taxed when more money is printed? Why?

=+16. Imagine that you are a policymaker trying to decide whether to reduce the rate of inflation.

=+15. During the Revolutionary War, the American colonies could not raise enough tax revenue to