Question
Bigram-based Checker and Generator (JAVA) A bigram is a pair of adjacent words in a sequence. Bigrams overlap so that in the sequence a b.
Bigram-based Checker and Generator (JAVA)
A bigram is a pair of adjacent words in a sequence. Bigrams overlap so that in the sequence "a b. c d", the bigrams are ("a", "b."), ("b.", "c"), ("c", "d"). You will write a simple parser which builds a bigram model based on input text and will allow checking and generating sentences. To do so, you should take advantage of Javas collection classes including Maps.
Create a class called Bigram. The class will have a constructor which takes a String. Use a Scanner with its default tokenization on the String. As long as hasNext() returns true, each call to next() will retrieve the next word. Note that some words will be capitalized differently or contain punctuation. Treat each of those differently (for example, "Dogs", "dogs", and "dogs." are all different strings).
Checking a phrase will consist of looking at each adjacent pair of adjacent words. If all adjacent pairs were seen in your input text, your code will return true, otherwise false.
Example:
Bigram x = new Bigram("Bob likes dogs. Bill likes cats. Jane hates dogs."); x.check("Bob likes cats.") returns true: "Bob likes" and "likes cats." both appear in the input
text.
x.check("Jane likes cats.") returns false: "Jane likes" does not appear in the input text.
Your phrase generation method will be given a start word and a count indicating the number of total words to generate (including the start word). It will generate the most likely or most common phrase based on bigram counts. It will return an array of Strings with the words generated in order. It always starts by generating the start word. As you generate each word, the next word generated
should be the one that appears most often in the input (constructor) text after the previous word
generated. If you reach a dead end (either the previous word was never seen or there are no words ever seen after that word), end generation early and return a shorter array. If there is more than one most common choice seen in the input text, pick the one with the smallest word according to the String.compareTo method (NOTE: OrderedSets and OrderedMaps such as TreeSets and TreeMaps order their set (or set of keys) according to compareTo.)
Example:
Bigram y = new Bigram("The balloon was red. The balloon got bigger and bigger. The balloon popped.");
y.generate("The", 3) returns the String array ["The", "balloon", "got"]
y.generate("popped.", 2) returns ["popped."]
A tester program will be released which will test multiple larger examples. Your code should be able to work with input text containing up to a million words.
This is the structure of java programming:
public class Bigram { // TODO: add member fields! You may have more than one. // You will probably want to use some kind of Map!
/** * Create a new bigram model based on the text given as a String argument. * See the assignment for more details (and also check out the Wikipedia * article on bigrams). * * @param s * text */ public Bigram(String s) { // TODO: implement me! }
/** * Check to see whether the sentence is possible according to the bigram * model. A sentence is possible if each bigram in the sentence was seen in * the text that was passed to the constructor. * * @param s * Sentence * @return true if possible, false if not possible (some transition does not * exist in the model as constructed) */ public boolean check(String s) { // TODO: implement me! return false; // Fix this! }
/** * Generate an array of strings based on the model, start word, and count. * You are given the start word to begin with. Each successive word should * be generated as the most likely or common word after the preceding word * according to the bigram model derived from the text passed to the * constructor. If more than one word is most likely, pick the smallest one * according to the natural String comparison order (compareTo order). Fewer * than count words may be generated if a dead end is reached with no * possibilities. If the start word never appears in the input text, only * that word will be generated. * * @param start * Start word * @param count * Number of words to generate (you may assume it's at least 1) * @return Array of generated words which begins with the start word and * will usually have the length of the count argument (less if there * is a dead end) */ public String[] generate(String start, int count) { // TODO: implement me! return null; // Fix this! Your method should never return null! } }
//Here is the code I get so far. The generate method is not completed yet and needed to be finished. I will be much appreciated if you can finish it.
package bigram; import java.util.ArrayList; import java.util.Scanner; import java.util.TreeMap; public class Bigram { // TODO: add member fields! You may have more than one. // You will probably want to use some kind of Map! TreeMaptreeMap = new TreeMap<>(); /** * Create a new bigram model based on the text given as a String argument. * See the assignment for more details (and also check out the Wikipedia * article on bigrams). * * @param s text */ public Bigram(String s) { // TODO: implement me! Scanner sc = new Scanner(s); int count = 0; String pare = ""; ArrayList words = new ArrayList<>(); while (sc.hasNext()) { words.add(sc.next()); } // System.out.println(words); for (int i = 0; i < words.size(); i++) { if (count == 2) { String key = pare.trim(); if (treeMap.containsKey(key)) { treeMap.put(key, treeMap.get(key) + 1); } else { treeMap.put(pare.trim(), 1); } count = 0; pare = ""; i--; } pare += words.get(i) + " "; count++; } treeMap.put(pare.trim(), 1); // System.out.println(treeMap); } public static void main(String[] args) { Bigram x = new Bigram("Bob likes dogs. Bill likes cats. Jane hates."); System.out.println(x.check("Bob likes cats.")); System.out.println(x.check("Jane likes cats.")); // Bigram y = new Bigram("The balloon was red. The balloon got bigger and bigger. The balloon popped."); // // y.generate("The", 3); //returns the String array ["The", "balloon", "got"] // // y.generate("popped.", 2); //returns ["popped."] } /** * Check to see whether the sentence is possible according to the bigram * model. A sentence is possible if each bigram in the sentence was seen in * the text that was passed to the constructor. * * @param s Sentence * @return true if possible, false if not possible (some transition does not * exist in the model as constructed) */ public boolean check(String s) { // TODO: implement me! String[] words = s.split(" "); ArrayList list = new ArrayList<>(); int count = 0; String pare = ""; for (int i = 0; i < words.length; i++) { if (count == 2) { list.add(pare.trim()); count = 0; pare = ""; i--; } pare += words[i] + " "; count++; } for (String word : list) { if (!treeMap.containsKey(word)) { return false; } } return true; // Fix this! } /** * Generate an array of strings based on the model, start word, and count. * You are given the start word to begin with. Each successive word should * be generated as the most likely or common word after the preceding word * according to the bigram model derived from the text passed to the * constructor. If more than one word is most likely, pick the smallest one * according to the natural String comparison order (compareTo order). Fewer * than count words may be generated if a dead end is reached with no * possibilities. If the start word never appears in the input text, only * that word will be generated. * * @param start Start word * @param count Number of words to generate (you may assume it's at least 1) * @return Array of generated words which begins with the start word and * will usually have the length of the count argument (less if there * is a dead end) */ public String[] generate(String start, int count) { //this method needs to be finished // TODO: implement me! ArrayList s = new ArrayList<>(); boolean flag = false; for (String key : treeMap.keySet()) { if (key.startsWith(start)) { flag = true; s.add(key); start = key.split(" ")[1]; for (String tmpKey : treeMap.keySet()) if (tmpKey.startsWith(start)) { s.add(tmpKey); } } } if (!flag) { return new String[]{start}; } System.out.println(s); return (String[]) s.toArray(); } }
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started