Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Heres the assignment: A bigram is a pair of adjacent words in a sequence. Bigrams overlap so that in the sequence a b. c d,

Heres the assignment:

A bigram is a pair of adjacent words in a sequence. Bigrams overlap so that in the sequence "a b. c d", the bigrams are ("a", "b."), ("b.", "c"), ("c", "d"). You will write a simple parser which builds a bigram model based on input text and will allow checking and generating sentences. To do so, you should take advantage of Javas collection classes including Maps. Create a class called Bigram. The class will have a constructor which takes a String. Use a Scanner with its default tokenization on the String. As long as hasNext() returns true, each call to next() will retrieve the next word. Note that some words will be capitalized differently or contain punctuation. Treat each of those differently (for example, "Dogs", "dogs", and "dogs." are all different strings). Checking a phrase will consist of looking at each adjacent pair of adjacent words. If all adjacent pairs were seen in your input text, your code will return true, otherwise false. Example: Bigram x = new Bigram("Bob likes dogs. Bill likes cats. Jane hates dogs."); x.check("Bob likes cats.") returns true: "Bob likes" and "likes cats." both appear in the input text. x.check("Jane likes cats.") returns false: "Jane likes" does not appear in the input text. Your phrase generation method will be given a start word and a count indicating the number of total words to generate (including the start word). It will generate the most likely or most common phrase based on bigram counts. It will return an array of Strings with the words generated in order. It always starts by generating the start word. As you generate each word, the next word generated should be the one that appears most often in the input (constructor) text after the previous word generated. If you reach a dead end (either the previous word was never seen or there are no words ever seen after that word), end generation early and return a shorter array. If there is more than one most common choice seen in the input text, pick the one with the smallest word according to the String.compareTo method (NOTE: OrderedSets and OrderedMaps such as TreeSets and TreeMaps order their set (or set of keys) according to compareTo.) Example: Bigram y = new Bigram("The balloon was red. The balloon got bigger and bigger. The balloon popped."); y.generate("The", 3) returns the String array ["The", "balloon", "got"] y.generate("popped.", 2) returns ["popped."]

Heres my program:

import java.util.ArrayList; import java.util.Scanner; import java.util.*; import java.util.regex.Matcher; import java.util.HashMap; import java.util.Iterator; import java.util.Set; import java.util.HashSet; import java.util.Stack; import java.util.TreeMap;

public class Bigram { TreeMap treeMap = new TreeMap<>(); public Map> Map;

ArrayList ListWordsinHW; int size; private Scanner scanner; public Bigram(String s) { Map = new HashMap>(); //first word in biagram //second word //integer is frequency

scanner = new Scanner(s); String end, before = null;

size = 0; ListWordsinHW = new ArrayList<>(); Scanner splitter = new Scanner(s); while(scanner.hasNext()){ end = scanner.next(); if(before != null ){ if(Map.get(before) == null) { Map.put(before, new TreeMap()); size = ListWordsinHW.size(); splitter.close(); } if(Map.get(before).get(end) == null) { Map.get(before).put(end,1); } else { int count = Map.get(before).get(end); Map.get(before).put(end,count+1); } } before = end; } } /** * to see whether the sentence is Maybe according to the bigram * model. A sentence is Maybe if each bigram in the sentence was seen in * the text that was passed to the constructor. * * @param s * Sentence * @return true if Maybe, false if not Maybe (some transition does not * exist in the model as constructed) */ public boolean check(String s) { String Check[] = s.split(" "); Scanner scanner = new Scanner(s); boolean Maybe = true; System.out.println("True"); String end, before = null;

while(scanner.hasNext()){ end = scanner.next(); if(before != null) { if(Map.get(before) == null) { Maybe = false; System.out.println("False"); } //'before next' does not exist if(Map.get(before).get(end) == null) { Maybe = false; } } before = end; } return Maybe; } /** * Generate an array of strings based on the model, start word, and count. * You are given the start word to begin with. Each successive word should * be generated as the most likely or common word after the preceding word * according to the bigram model derived from the text passed to the * constructor. If more than one word is most likely, pick the smallest one * according to the natural String comparison order (compareTo order). Fewer * than count words may be generated if a dead end is reached with no * possibilities. If the start word never appears in the input text, only * that word will be generated. * * @param start * Start word * @param count * Number of words to generate (you may assume it's at least 1) * @return Array of generated words which begins with the start word and * will usually have the length of the count argument (less if there * is a dead end) */ public String[] generate(String start, int count) { ArrayList list = new ArrayList<>(); String[] Generated = new String[count];

Generated[0] = start; // the first generated string start int i = 1; while(i < count) { if(Map.get(Generated[i-1]) == null) { // no words follow System.out.println("No words followed previous word"); } Set Generating = Map.get(Generated[i-1]).keySet();

List ListInOrder = new ArrayList(Generating);

//return frequencies in ascending order Collections.reverse(ListInOrder); //descending order of frequency

int Max = 0; String LeastWords = null; for (String Start : ListInOrder) {

if(Max == 0) { LeastWords = Start; Max = Map.get(Generated[i-1]).get(Start); } else if (Map.get(Generated[i-1]).get(Start)==Max) { // same frequency if(Start.compareTo(LeastWords) < 0) { // compare also smaller LeastWords = Start; }

} } Generated[i++] = LeastWords; } return Generated; }

public static void main (String []args){ Bigram Tester = new Bigram("Bob likes cats. Bob likes dogs. Bob hates Jane. Jane hates cats. Jane likes dogs. Bill hates both. Jack likes cats."); // return true System.out.println(Tester.check("Bob likes dogs.")); // return true System.out.println(Tester.check("Jane likes cats.")); //return false System.out.println(Tester.check("Bill likes cats."));

for(String Start: Tester.generate("Bob", 3)) { System.out.print(Start + " ");

//Bigram y = new Bigram("The balloon was red. The balloon got bigger and bigger. The balloon popped."); // //y.generate("The", 3); //returns the String array ["The", "balloon", "got"] // //y.generate("popped.", 2); //returns ["popped."]

}}}

Heres the tester:

import java.nio.file.Files; import java.nio.file.Paths; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.util.Arrays;

public class BigramTest {

public static int test(String file, byte[] xmd5, String[] gen, String[] desired, String[] check, boolean[] truth) throws NoSuchAlgorithmException { System.out.println("Loading " + file + "..."); String text; try { text = new String(Files.readAllBytes(Paths.get(file))); } catch (Exception e) { System.out.println("Couldn't find '" + file + "'. Please place this file in the root directory of this project (next to JRE System Library, not indented)."); return 0; } MessageDigest md5 = MessageDigest.getInstance("MD5"); byte[] digest = md5.digest(text.replaceAll("\\s+", " ").getBytes()); //System.out.println(Arrays.toString(digest)); if (!Arrays.equals(digest, xmd5)) { System.out.println("Your copy of " + file + " appears to contain errors! Please download it again."); return 0; } System.out.println("Loaded " + file + ". Initializing Bigram object..."); long start = System.currentTimeMillis(); Bigram u = new Bigram(text); System.out.println("Generating."); int genScore = 0; for (int i = 0; i < gen.length; i++) { String[] foo = u.generate(gen[i], 10); if (foo == null) { System.out.println("For start word " + gen[i] + " with 10 words, you returned a null array!"); continue; } String gened = ""; for (int j = 0; j < foo.length; j++) { gened = gened + foo[j] + (j < foo.length - 1 ? " " : ""); } if (gened.equals(desired[i])) { genScore += 10; } else { System.out.println("For start word " + gen[i] + " with 10 words, expected '" + desired[i] + "' got '" + gened + "'."); } } System.out.println("Checking."); int checkScore = 0; for (int i = 0; i < check.length; i++) { boolean ck = u.check(check[i]); if (ck == truth[i]) { checkScore += 10; } else { System.out .println("For phrase '" + check[i] + "' expected return value " + truth[i] + " but got " + ck); } } long end = System.currentTimeMillis(); // Attempt at a benchmark... Arrays.sort(text.toLowerCase().toUpperCase().split("\\s")); Arrays.sort(text.toUpperCase().toLowerCase().toCharArray()); Arrays.sort(text.split("\\s")); Arrays.sort(text.getBytes()); long sortime = System.currentTimeMillis(); //System.out.println((double)(end-start-5)/(sortime - end)); if ((double)(end - start - 5)/(sortime - end) > 8) { System.out.println("Your program is taking a while! Try speeding it up for extra credit."); } else if ((double)(end - start - 5)/(sortime - end) > 2) { System.out.println("Fast, but could be faster! Takes "+(end-start)+" ms, try to get it below ~"+(2*(sortime - end)+5)); genScore += 1; } else { System.out.println("Super fast! Took "+(end - start)+" ms"); genScore += 1; checkScore += 1; } return genScore * 100 + checkScore; }

public static void main(String[] args) throws NoSuchAlgorithmException { final byte[] dmd5 = { -61, 106, 118, -21, 62, -73, 33, 75, 68, -48, 38, 39, 108, 27, 95, -44 }; final byte[] gmd5 = { -59, 120, 53, -92, 81, 59, -34, 72, 56, 2, 112, -125, 127, 50, -42, 55 }; int checkScore = 0, genScore = 0; try { System.out.println("Trying 'Bob' example from homework."); Bigram x = new Bigram("Bob likes dogs. Bill likes cats. Jane hates dogs."); if (x.check("Bob likes cats.")) { checkScore += 10; } else { System.out.println("First check failed."); } if (!x.check("Jane likes cats.")) { checkScore += 10; } else { System.out.println("Second check failed."); } System.out.println("Trying 'Balloon' example from homework."); Bigram y = new Bigram("The balloon was red. The balloon got bigger and bigger. The balloon popped."); String[] g1 = y.generate("The", 3); if (Arrays.equals(g1, new String[] { "The", "balloon", "got" })) { genScore += 10; } else { System.out.println("First generate failed. Got " + Arrays.toString(g1)); } String[] g2 = y.generate("popped.", 2); if (Arrays.equals(g2, new String[] { "popped." })) { genScore += 10; } else { System.out.println("Second generate failed. Got " + Arrays.toString(g2)); }

System.out.println("Testing with the Declaration of Independence..."); int dscores = test("decl.txt", dmd5, new String[] { "When" }, new String[] { "When in the most barbarous ages, and to the most" }, new String[] { "We have Petitioned for the rectitude of this Declaration,", "instrument for pretended offences For abolishing" }, new boolean[] { true, true }); genScore += dscores / 100; checkScore += dscores % 100;

System.out.println("Testing with Great Expectations..."); int gscores = test("gexp.txt", gmd5, new String[] { "Pip", "dozen" }, new String[] { "Pip and I had been a little while, and I", "dozen yards of the same time to be a little" }, new String[] { "low leaden hue" }, new boolean[] { false }); genScore += gscores / 100; checkScore += gscores % 100;

} finally { System.out.println("Check: " + checkScore + " / 50"); System.out.println("Generate: " + genScore + " / 50"); System.out.println("Tentative total: " + (checkScore + genScore + " / 100")); System.out.println("Violations of the academic honesty policy may affect this score."); } }

}

Can anyone fix my program so the tester will give 100 points? Im stuck at 40 points and i dont know whats wrong with my program. thank you

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Accounting And Auditing Research And Databases Practitioner's Desk Reference

Authors: Thomas R. Weirich, Natalie Tatiana Churyk, Thomas C. Pearson

1st Edition

1118334426, 978-1118334423

More Books

Students also viewed these Databases questions

Question

2. What additional techniques might be used to assess talent?

Answered: 1 week ago

Question

Explain the function and purpose of the Job Level Table.

Answered: 1 week ago