Question

1 Approved Answer

Posted on Sep 07, 2024

This program will have you implement a class called Lexicon, which is similar to the Bag data structure introduced Monday of Week 3 but it

This program will have you implement a class called Lexicon, which is similar to the Bag data structure introduced Monday of Week 3 but it functions as a set of distinct words rather than a list of possibly repeated integers. The complete lexicon interface is given in lexicon.h; your implementation goes in lexicon.cpp. Part of your task is to modify the construction and mutation functionality of Bag to reflect the set-like nature of Lexicon, and part of your task is to overload several additional operators corresponding to set operations. This handout overviews the interface and details how to implement each of the required functions. You will only upload your implementation in lexicon.cpp , but you should use the provided skeletal main.cpp driver program to do your own unit testing as you implement each Lexicon function.

The implementation. Your main task is to implement all the functions declared in the interface. You can tackle these tasks in the following stages: 1. Note that a 0-argument constructor creating an empty set is already implemented for you inline. Implement a 1-argument constructor to open a text file and populate the data array with up to CAPACITY distinct words in the order that they appear in the file, keeping track of the number of distinct words in size . Because the file may have duplicate words, it is possible for it to have more tokens than the capacity of a lexicon but for it to not fill a lexicon. For example, if you have a file hello.txt with CAPACITY instances of "hello" followed by one instance of "world", then calling Lexicon lex("hello.txt") from main should result in an object lex where lex.data is an array whose first two entries are "hello" and "world" and lex.size is 2. However, if there are more than CAPACITY distinct values in your file, your constructor should fill the data array with the first CAPACITY distinct values and ignore the rest of the file. This program does not require you to do any string manipulation. Use == to test string inequality, which treats tokens "Hello", "hello" and "hello!" as three distinct words.

2. Implement three boolean member functions contains , insert , and remove consistent with the pre- and post-conditions stipulated in the header file.

3. Overload three bitwise operators as member functions so that lex1 | lex2 corresponds to the union of the two lexicons (i.e., any string in either or both of the underlying lexicons), lex1 & lex2 corresponds to the intersection of the two lexicons (i.e., any string in both of the underlying lexicons), and lex1 lex2 corresponds to the symmetric difference of the two lexicons (i.e., any string that appears in exactly one not both of the two underlying lexicons). Note that because each of these operators is implemented as a member function, the function is written with lex1 as the reference object and lex2 as a single parameter, and the function constructs, populates, and returns a third lexicon object.

4. Overload six comparison operators as nonmember functions with the following functionality: - lex1 == lex2 is true if and only if every string in one of the lexicons is also in the other (though not necessarily in the same position), - lex1 != lex2 is true if and only if there is some string in one lexicon and not the other, - lex1 = lex2 is true if and only if the second lexicon is a subset of the first, and - lex1 > lex2 is true if and only if the second lexicon is a strict subset of the first. Note that because each of these operators is implemented as a nonmember function, the contents of the parameters can only be accessed through public functions such as [], which is implemented for you, and contains, which you will have implemented in Step 2.

Use the main.cpp file to test your implementation as you go. Writing unit tests to test each function one at a time as you develop rather than running all tests at the end. This will make it much easier to identify the source of both syntactical and logical bugs, saving you lots of time and headache! Notice that

image text in transcribed

lexicon.h: #ifndef LEXICON_H #define LEXICON_H #include #include #include using std::string; using std::ostream; using std::ifstream; class Lexicon { public: // STATIC CONSTANT static const size_t CAPACITY = 2000; // CONSTRUCTORS // Pre: N/A // Post: An empty (size 0) lexicon is created Lexicon() : size_(0) {} // Pre: A text file called filename is in the current directory // Post: A lexicon w/ the file's first CAPACITY distinct strings is created Lexicon(string filename); // CONSTANT ACCESSOR MEMBER FUNCTIONS // Pre: N/A // Post: The number of elements stored is returned size_t size() const { return size_; } // Pre: N/A // Post: The truth value of whether the word is in the lexicon is returned bool contains(string word) const; // Pre: This object has at least pos+1 elements // Post: The element at index pos is returned string operator [](size_t pos) const { return data_[pos]; } // MUTATOR MEMBER FUNCTIONS // Pre: N/A // Post: The word is added to the end of the lexicon and true is returned // if word was distinct and there was space; false is returned otherwise bool insert(string word); // Pre: N/A // Post: If the word was present, it is removed and true is returned; // false is returned otherwise bool remove(string word);

// SET OPERATOR OVERLOADS // Pre: N/A // Post: A union lexicon is constructed, populated with the first CAPACITY // distinct elements of this object and rhs, and returned Lexicon operator |(const Lexicon& rhs) const; // Pre: N/A // Post: An intersection lexicon is constructed, populated with the elements // common to this object and rhs, and returned Lexicon operator &(const Lexicon& rhs) const; // Pre: N/A // Post: A symmetric difference lexicon is constructed, populated with // the first CAPACITY elements of this object XOR rhs, and returned Lexicon operator ^(const Lexicon& rhs) const; private: // INVARIANTS: data_[0],...,data_[size_-1] always contain the elements string data_[CAPACITY]; size_t size_; }; // INSERTION OPERATOR OVERLOAD // Pre: N/A // Post: Space-separated in-order contents of lex are inserted in out ostream& operator =(const Lexicon& lhs, const Lexicon& rhs); // Pre: N/A // Post: True is returned iff every rhs element is in lhs which is not identical bool operator >(const Lexicon& lhs, const Lexicon& rhs);

#endif

The US Constitution contains 1680 distinct words. The Declaration of Independence contains 632 distinct words. The two documents have 202 words in common. There are 1908 words in their symmetric difference. Does it make sense that there are 2000 words in the combined lexicon? 12.0238\% of Constitution words are in the Declaration of Independence. "Order" is one that is not. 31.962\% of Declaration of Independence words are in the Constitution. "Honor." is one that is not. The Constitution's preamble contains 2.32143% of the distinct words in the Constitution and no others. The strict subset relationship remains true if we flip the operator and args even though it's a different function call. Scrambling the words of the preamble gives an equivalent lexicon, which is equivalent to the union of the two. Anna Karenina is 350188 words (with repetition). The number of distinct words reaches our class's capacity