Problems Goal Find the shortest common ancestor of a digraph in WordNet, a semantic lexicon for...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Problems Goal Find the shortest common ancestor of a digraph in WordNet, a semantic lexicon for the English language that compu- tational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM's Jeopardy- playing Watson computer system. WordNet groups words into sets of synonyms called synsets. For example, {AND circuit, AND gate} is a synset that represents a logical gate that fires only when all of its inputs fire. WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset {gate, logic gate} is a hypernym of {AND circuit, AND gate} because an AND gate is a kind of logic gate. The WordNet Digraph Your first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→ w denotes that w is a hypernym of v. The WordNet digraph is a rooted DAG: it is acyclic and has one vertex - the root that is an ancestor of every other vertex. However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph is shown below. The WordNet Input File Formats We now describe the two data files that you will use to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas. List of synsets. The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i. The first field is the synset id, which is always the integer i; the second field is the synonym set (or synset); and the third field is its dictionary definition (or gloss), which is not relevant to this assignment. id % more synsets.txt ⠀ synset 34 (AIDS acquired_immune_deficiency_syndrome, a serious (often fatal) disease of the immune system 35, ALGOL, a programming language used to express computer programs as algorithms 36 AND_circuit AND_gate, a circuit in a computer that fires only when all of its inputs fire 37, APC, a drug combination found in some over-the-counter headache remedies 38, ASCII character, any member of the standard code for representing characters by binary numbers 39, ASCII character_set, (computer science) 128 characters that make up the ASCII coding scheme 40, ASCII_text_file, a text file that contains only ASCII characters without special formatting 41, ASL American_sign_language, the sign language used in the United States 42, AWOL (one who is away or absent without leave ⠀ gloss For example, line 36 implies that the synset AND_circuit AND_gate has an id number of 36 and it's gloss is "a circuit in a computer that fires only when all of its inputs fire". The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the words are connected by the underscore character. List of hypernyms. The file hypernyms.txt contains the hypernym relationships. Line i of the file contains the hypernyms of synset i. The first field is the synset id, which is always the integer i; subsequent fields are the id numbers of the synset's hypernyms. id % more hypernyms.txt ⠀ 34,47569,48084 35, 19983 36 42338 37,53717 38,28591 39,28597 40,76057 41,70206 42,18793 ⠀ hypernyms For example, line 36 implies that synset 36 (AND_circuit AND_Gate) has 42338 (gate logic_gate) as it only hypernym. Line 34 implies that synset 34 (AIDS acquired_immune_deficiency_syndrome) has two hypernyms: 47569 (immunodeficiency) and 48084 (infectious disease). damage harm impairment change alteration modification transition leap hump saltation happening occurrence occurrent natural event miracle increase jump leap event miracle forfeit forfeiture sacrifice resistance opposition demotion act human_action human activity action change motion movement move locomotion travel ↑ run running dash sprint transgression variation group_action descent jump parachuting Problem 1. (WordNet Data Type) Implement an immutable data type called WordNet with the following API: WordNet WordNet (String synsets, String hypernyms) Iterable<String> nouns () boolean is Noun (String word) String sca(String noun1, String noun2) int distance (String noun1, String noun2) constructs a WordNet object given the names of the input (synset and hypernym) files returns all WordNet nouns returns true if the given word is a WordNet noun, and false otherwise returns a synset that is a shortest common ancestor of nount and noun2 returns the length of the shortest ancestral path between nount and noun2 Corner Cases • The constructor should throw a NullPointerException() with the message "synsets is null" if synsets is null and the message "hypernyms is null" if hypernyms is null. • The isNoun () method should throw a NullPointerException ("word is null") if word is null. • The sca() and distance () methods should throw a NullPointerException() with the message "nounl is null" or "noun2 is null" if noun or noun2 is null. The methods should throw an Illegal ArgumentException () with the message "nounl is not a noun" or "noun2 is not a noun" if noun1 or noun2 is not a noun. Performance Requirements • The constructor and the nouns () method should run in time T(n) ~n, where n is the size of the WordNet lexicon. • The isNoun () method should run in time T(n) ~ 1. • The sca() and distance () methods should make exactly one call to the ancestor () and length() methods in Shortest Common Ancestor, respectively. >_"/workspace/project6 $ java WordNet data/synsets.txt data/hypernyms.txt worm bird # of nouns 119188 isNoun (worm)? true isNoun (bird)? true isNoun (worm bird)? false. sca (worm, bird) = animal animate_being beast brute creature fauna distance (worm, bird) = 5 Directions: Instance variables: A symbol table that maps a synset noun to a set of synset IDs (a synset noun can belong to multiple synsets), RedBlackBST<String, SET<Integer >> st. - A symbol table that maps a synset ID to the corresponding synset string, RedBlackBST<Integer, String> rst. - For shortest common ancestor computations, ShortestCommon Ancestor sca. WordNet (String synsets, String hypernyms) Initialize instance variables st and rst appropriately using the synset file. - Construct a DiGraph object & (representing a rooted DAG) with V vertices (equal to the number of entries in the synset file), and add edges to it, read in from the hypernyms file. - Initialize sca using G. • Iterable<String> nouns () - Return all WordNet nouns. boolean isNoun (String word) - Return true if the given word is a synset noun, and false otherwise. ●String sca(String noun1, String noun2) - Use sca to compute and return a synset that is a shortest common ancestor of the given nouns. • int distance (String noun1, String noun2) – Use sca to compute and return the length of the shortest ancestral path between the given nouns. Shortest Common Ancestor An ancestral path between two vertices v and w in a rooted DAG is a directed path from v to a common ancestor x, together with a directed path from w to the same ancestora. A shortest ancestral path is an ancestral path of minimum total length. We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor. Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path. v = 3, w = 10 shortest ancestral path: 3-1-5-9-10 associated length: 4 shortest common ancestor: 1 13 8 (14) 10 8 We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B. 16 22 A = { 13, 23, 24), B = {6, 16, 17 } ancestral path: 13-7-3-1-0-2-6 ancestral path: 23-20-12-5-10-17 ancestral path: 23-20-12-5-2-6 (17 18 v = 1, w = 5 ancestral path: 1-2-3-4-5 shortest ancestral path: 1-0-5 associated length: 2 shortest common ancestor: 0 19 0 20 (23) (24 shortest ancestral path: 13-7-3-9-16 associated length: 4 shortest common ancestor: 3 Problems Goal Find the shortest common ancestor of a digraph in WordNet, a semantic lexicon for the English language that compu- tational linguists and cognitive scientists use extensively. For example, WordNet was a key component in IBM's Jeopardy- playing Watson computer system. WordNet groups words into sets of synonyms called synsets. For example, {AND circuit, AND gate} is a synset that represents a logical gate that fires only when all of its inputs fire. WordNet also describes semantic relationships between synsets. One such relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more general synset). For example, the synset {gate, logic gate} is a hypernym of {AND circuit, AND gate} because an AND gate is a kind of logic gate. The WordNet Digraph Your first task is to build the WordNet digraph: each vertex v is an integer that represents a synset, and each directed edge v→ w denotes that w is a hypernym of v. The WordNet digraph is a rooted DAG: it is acyclic and has one vertex - the root that is an ancestor of every other vertex. However, it is not necessarily a tree because a synset can have more than one hypernym. A small subgraph of the WordNet digraph is shown below. The WordNet Input File Formats We now describe the two data files that you will use to create the WordNet digraph. The files are in comma-separated values (CSV) format: each line contains a sequence of fields, separated by commas. List of synsets. The file synsets.txt contains all noun synsets in WordNet, one per line. Line i of the file (counting from 0) contains the information for synset i. The first field is the synset id, which is always the integer i; the second field is the synonym set (or synset); and the third field is its dictionary definition (or gloss), which is not relevant to this assignment. id % more synsets.txt ⠀ synset 34 (AIDS acquired_immune_deficiency_syndrome, a serious (often fatal) disease of the immune system 35, ALGOL, a programming language used to express computer programs as algorithms 36 AND_circuit AND_gate, a circuit in a computer that fires only when all of its inputs fire 37, APC, a drug combination found in some over-the-counter headache remedies 38, ASCII character, any member of the standard code for representing characters by binary numbers 39, ASCII character_set, (computer science) 128 characters that make up the ASCII coding scheme 40, ASCII_text_file, a text file that contains only ASCII characters without special formatting 41, ASL American_sign_language, the sign language used in the United States 42, AWOL (one who is away or absent without leave ⠀ gloss For example, line 36 implies that the synset AND_circuit AND_gate has an id number of 36 and it's gloss is "a circuit in a computer that fires only when all of its inputs fire". The individual nouns that constitute a synset are separated by spaces. If a noun contains more than one word, the words are connected by the underscore character. List of hypernyms. The file hypernyms.txt contains the hypernym relationships. Line i of the file contains the hypernyms of synset i. The first field is the synset id, which is always the integer i; subsequent fields are the id numbers of the synset's hypernyms. id % more hypernyms.txt ⠀ 34,47569,48084 35, 19983 36 42338 37,53717 38,28591 39,28597 40,76057 41,70206 42,18793 ⠀ hypernyms For example, line 36 implies that synset 36 (AND_circuit AND_Gate) has 42338 (gate logic_gate) as it only hypernym. Line 34 implies that synset 34 (AIDS acquired_immune_deficiency_syndrome) has two hypernyms: 47569 (immunodeficiency) and 48084 (infectious disease). damage harm impairment change alteration modification transition leap hump saltation happening occurrence occurrent natural event miracle increase jump leap event miracle forfeit forfeiture sacrifice resistance opposition demotion act human_action human activity action change motion movement move locomotion travel ↑ run running dash sprint transgression variation group_action descent jump parachuting Problem 1. (WordNet Data Type) Implement an immutable data type called WordNet with the following API: WordNet WordNet (String synsets, String hypernyms) Iterable<String> nouns () boolean is Noun (String word) String sca(String noun1, String noun2) int distance (String noun1, String noun2) constructs a WordNet object given the names of the input (synset and hypernym) files returns all WordNet nouns returns true if the given word is a WordNet noun, and false otherwise returns a synset that is a shortest common ancestor of nount and noun2 returns the length of the shortest ancestral path between nount and noun2 Corner Cases • The constructor should throw a NullPointerException() with the message "synsets is null" if synsets is null and the message "hypernyms is null" if hypernyms is null. • The isNoun () method should throw a NullPointerException ("word is null") if word is null. • The sca() and distance () methods should throw a NullPointerException() with the message "nounl is null" or "noun2 is null" if noun or noun2 is null. The methods should throw an Illegal ArgumentException () with the message "nounl is not a noun" or "noun2 is not a noun" if noun1 or noun2 is not a noun. Performance Requirements • The constructor and the nouns () method should run in time T(n) ~n, where n is the size of the WordNet lexicon. • The isNoun () method should run in time T(n) ~ 1. • The sca() and distance () methods should make exactly one call to the ancestor () and length() methods in Shortest Common Ancestor, respectively. >_"/workspace/project6 $ java WordNet data/synsets.txt data/hypernyms.txt worm bird # of nouns 119188 isNoun (worm)? true isNoun (bird)? true isNoun (worm bird)? false. sca (worm, bird) = animal animate_being beast brute creature fauna distance (worm, bird) = 5 Directions: Instance variables: A symbol table that maps a synset noun to a set of synset IDs (a synset noun can belong to multiple synsets), RedBlackBST<String, SET<Integer >> st. - A symbol table that maps a synset ID to the corresponding synset string, RedBlackBST<Integer, String> rst. - For shortest common ancestor computations, ShortestCommon Ancestor sca. WordNet (String synsets, String hypernyms) Initialize instance variables st and rst appropriately using the synset file. - Construct a DiGraph object & (representing a rooted DAG) with V vertices (equal to the number of entries in the synset file), and add edges to it, read in from the hypernyms file. - Initialize sca using G. • Iterable<String> nouns () - Return all WordNet nouns. boolean isNoun (String word) - Return true if the given word is a synset noun, and false otherwise. ●String sca(String noun1, String noun2) - Use sca to compute and return a synset that is a shortest common ancestor of the given nouns. • int distance (String noun1, String noun2) – Use sca to compute and return the length of the shortest ancestral path between the given nouns. Shortest Common Ancestor An ancestral path between two vertices v and w in a rooted DAG is a directed path from v to a common ancestor x, together with a directed path from w to the same ancestora. A shortest ancestral path is an ancestral path of minimum total length. We refer to the common ancestor in a shortest ancestral path as a shortest common ancestor. Note that a shortest common ancestor always exists because the root is an ancestor of every vertex. Note also that an ancestral path is a path, but not a directed path. v = 3, w = 10 shortest ancestral path: 3-1-5-9-10 associated length: 4 shortest common ancestor: 1 13 8 (14) 10 8 We generalize the notion of shortest common ancestor to subsets of vertices. A shortest ancestral path of two subsets of vertices A and B is a shortest ancestral path over all pairs of vertices v and w, with v in A and w in B. 16 22 A = { 13, 23, 24), B = {6, 16, 17 } ancestral path: 13-7-3-1-0-2-6 ancestral path: 23-20-12-5-10-17 ancestral path: 23-20-12-5-2-6 (17 18 v = 1, w = 5 ancestral path: 1-2-3-4-5 shortest ancestral path: 1-0-5 associated length: 2 shortest common ancestor: 0 19 0 20 (23) (24 shortest ancestral path: 13-7-3-9-16 associated length: 4 shortest common ancestor: 3
Expert Answer:
Answer rating: 100% (QA)
WordNetjava import javaioFile import javautilArrayList import javautilHashSet import javautilList import javautilSet public class WordNet File synsetsFile File hypernymsFile Digraph hypernyms List Syn... View the full answer
Related Book For
Principles of Information Systems
ISBN: 978-0324665284
9th edition
Authors: Ralph M. Stair, George W. Reynolds
Posted Date:
Students also viewed these programming questions
-
References Mailings Review View Help Tell me what you want to do 1 A7 All Markup Previous Show Markup Next Translate Language New Delete Previous Next Show Comment Ink Comments Comment Pen Eraser...
-
3. A simply supported beam of length 400 mm is shown in the figure below. The beam has a circular cross section with a diameter of 20 mm. An oscillating bending couple moment is applied at the center...
-
answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...
-
Monroe Inc. is an all-equity firm with 500,000 shares outstanding. It has $2,000,000 of EBIT, and EBIT is expected to remain constant in the future. The company pays out all of its earnings, so...
-
How does gender affect the type of advertising that proves to be most effective? An article in the Journal of Advertising Research (May/June 1990) makes reference to numerous studies that conclude...
-
What are the magnitude and direction of the net gravitational force on the 20.0 kg mass in FIGURE P13.36? y 10.0 kg 10.0 kg 5.0 cm 5.0 cm 20.0 cm 20.0 kg FIGURE P13.36
-
A student found that the sum of the degrees of the vertices in a graph was 13 . Why is that impossible?
-
Abbott and Abbott has a noncontributory, defined benefit pension plan. At December 31, 2011, Abbott and Abbott received the following information: The expected long-term rate of return on plan assets...
-
1.Today,SandraSaverdeposited$2,750inanaccountpaying9.0%APR,compoundedeverythreemonths.Ifshemakesnofurtherdepositsorwithdrawalsforthenext13years,howmuchinterestwillsheearninyeartwo?...
-
Why is competition in internet streaming services heating up? Who is jumping into the fray, and why? How do these companies differ? What do you expect the result of this intensifying competition will...
-
Why is this process significant when choosing a sampling method?
-
Kraft Heinz was compelled to restate nearly three years of financial reporting following an examination. As a financial analyst, do you believe that restatements of this nature should raise warning...
-
Defend and Critique Economic Convergence Economic Convergence is the pattern that low-income countries can grow faster than high-income countries, allowing them to catch up to their higher-income...
-
Write a memo with references toGAG Ltd: With clear numerical illustrations discussing the importance of control accounts. Describe methods used in dealing with incomplete records and their...
-
a ) Sutera Business Sdn Bhd has recorded sales of 2 5 0 , 0 0 0 units at RM 1 2 in the financial year 2 0 2 2 . The operating costs of Sutera Business Sdn Bhd are as follows: Operating Variable cost...
-
What is the marketability premium? Why should an issuing firm consider paying this premium?
-
15 The following information applies to the questions displayed below] Tear total cash dividends Year 2 total cash dividends Year 3 total cash dividends Tear 4 total cash dividends $12,100 20,500...
-
Is it a breach of fiduciary duty for a director of a real estate investment trust (REIT) negotiating a joint venture on behalf of the REIT with another director for the development of a portfolio of...
-
You have been hired to perform systems investigation for a French restaurant owner in a large metropolitan area. She is thinking of opening a new restaurant with a state-of-theart computer system...
-
What is the hierarchy of data in a database?
-
Assume that you want to start a new video rental business for students at your college or university. Go through logical design for a new information system to help you keep track of the videos in...
-
What natural instincts cause some managers to resist detailed comparisons with competitors?
-
How do growth economies differ from scale economies or learning curve effects?
-
What is the significance of strong growth economies among the Fortune 500?
Study smarter with the SolutionInn App