Question
Your assignment is to write a program that implements B+ tree indexing. It will read a text file containing data and build the index, treating
Your assignment is to write a program that implements B+ tree indexing. It will read a text file containing data and build the index, treating the first n columns as the key, where n will be specified by the user. Writing this program will require knowledge of random-access I/O in the language of your choice. Allowable language is Java. The structure of the index is as follows: First 256 bytes: Name of the text file you have indexed. This must be blank-filled on the right. You may need other metadata in this first block, so I suggest you allocate 1K so you can read it in as a block. (The size of the key is one such piece of metadata; see 1 below.) The rest of the file is 1K blocks of index data, according to the way a B+ tree is structured. This implies that your program should read in a block of data, manipulate it, and possibly (if you are inserting key/pointer pairs) write it back out as a block. Use a long (8-byte) record address for your pointers. These "pointers" are the byte offset in the text file of the data record. Thus the first record is at offset 0, the second is at offset of 0 plus the length of the first record, and so on. Note that the structure mixes text data (the key) and binary data (the pointer.) Converting the pointer to text digits is not allowed. Reading everything into memory, doing the manipulation, and writing it out is also not allowed because this program could potentially handle millions of records. You should buffer at most 3 disk blocks. This program should run entirely from the command line. Each of the functions and their command-line parameters are given below, assuming the name of your program is INDEX: 1. Create an index. This takes four parameters:
-create
The name of a text file to be indexed. Use the file I have provided.
The name of an output index that is created. Use the extension .indx for this.
How many bytes of the record, starting from the first position, are considered to be the key. Note: DO NOT hard-code this number in your program. The code must use whatever number is provided on the command line as long as it is between 1 and 40, inclusive.
If the output file exists, overwrite it. Do not ask. Build the index by starting with a blank structure and inserting new keys. There may be duplicate keys in the input file. Your program should list them and the line number of the input file on which they occur. Duplicates should not be inserted. The command line is: INDEX -create
For example: INDEX -create CS6360Asg5Data.txt CS6360Asg5.indx 15
--Creates an index with a key length of 15 bytes.
INDEX create CS6360Asg5Data.txt CS6360Asg5.indx 21 --Creates an index with a key length of 21 bytes.
2. Find a record by key. This displays the entire record, including the key, and gives its position, in bytes, within the file. If the key is not in your index, the program must give a message to that effect. As you can guess, the reason for having the name of the text file in your B+ tree index is so you can find it for this and the subsequent command. The command line is:
INDEX -find
If the key you supply is longer than the key length specified, your program should truncate it. If it is shorter, pad it on the right with blanks. 3. Insert a new text record. As with creating a new index, the first
INDEX -list
Hint: It really helps to have a hex file dump utility when you're working on something like this, since the file you create is not all text.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started