Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

implement a class called MarkovModel.java that can build a k -th order Markov model from text stored in a given file. (Eventually, you will build

implement a class called MarkovModel.java that can build a k-th order Markov model from text stored in a given file. (Eventually, you will build two objects using this class, one for Bush and one for Kerry.) Constructing a k-th order Markov model from text sequences is mostly a matter of counting. For each sequence p of length k (let us call it the context), we need to estimate the probability of p being followed by each letter c in our alphabet. Given a text sample, this probability can be estimated by the number of times that p is followed by c, divided by the number of times that p appears as a context at all. That is, we can estimate the probability of c following p by N(pc) / N(p), where N(pc) denotes the number of times we observe the concatenated sequence pc, and N(p) denotes the number of times that we observe p.

Unfortunately, this estimate will be problematic if some of the counts are zero, as is certain to happen on real data. Therefore, instead, we will use a different "smoothed" estimate, namely (N(pc) + 1) / (N(p) + S), where S is the size of our alphabet. This form of smoothing, called Laplace smoothing, has theoretical justification coming from probability theory, and ensures that zero counts will not be a problem.

For example, if the input text is "aabcabaacaac", and we are using a second-order Markov model (k = 2) and the three-letter alphabet is {a, b, c}, then we can estimate the probability that "aa" is followed by 'c'; by (2 + 1) / (3 + 3) = 1/2 since N(aa) = 3, N(aac) = 2 and S = 3. Similarly, the probabilities that "aa" is followed by 'a' and 'b' is 1/6 and 1/3, respectively. Note that the probabilities sum to 1.

One thing to notice here: To handle the beginning and end of the string, we treat the string as circular. Thus, N(caa) = 2 instead of 1.

So, in constructing a Markov model, your first job is to write code that will compute the appropriate counts N() by scanning the text file and counting how many times all sequences of the required lengths appear in the file. These counts should be stored appropriately for later use. You should also compute and record the alphabet size S. Organize your program by creating a data type MarkovModel so that the first line below computes the number of occurrences of each substring from s of size k and k+1; the second line prints it to standard output; and remaining lines compute and print Laplace smoothed probability estimates.

MarkovModel model = new MarkovModel(k, s); System.out.println(model); System.out.printf("%.4f ", model.laplace("aac")); System.out.printf("%.4f ", model.laplace("aaa")); System.out.printf("%.4f ", model.laplace("aab"));

For example, when k = 2 and s = "aabcabaacaac", this code fragment should print the following.

S = 3 "aa" 3 "ab" 2 "ac" 2 "ba" 1 "bc" 1 "ca" 3 "aab" 1 "aac" 2 "aba" 1 "abc" 1 "aca" 2 "baa" 1 "bca" 1 "caa" 2 "cab" 1 0.5000 0.1667 0.3333

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Objects And Databases International Symposium Sophia Antipolis France June 13 2000 Revised Papers Lncs 1944

Authors: Klaus R. Dittrich ,Giovanna Guerrini ,Isabella Merlo ,Marta Oliva ,M. Elena Rodriguez

2001st Edition

3540416641, 978-3540416647

More Books

Students also viewed these Databases questions

Question

★★★★★

Could Leeza avoid the obligation to generate and sell new documents at the price set by Cain on the basis that she had previously told Cain not to take orders for documents not in inventory?

Answered: 1 week ago

Question

★★★★★

Selecting a specific severity level ( 1 - 7 ) causes the Cisco Firewall to send the messages for that level, as well as all the levels above it , to the output location. True False

Answered: 1 week ago

Question

★★★★★

Should a transaction be recorded first in a journal or the ledger? Why?

Answered: 1 week ago

Question

★★★★★

Selected account balances for the year ended December 31 are provided below for Superior Company: Selling and administrative salaries?????????? $110,000 Insurance, factory???????????????..??? ? ?...

Answered: 1 week ago

Question

★★★★★

implement a class called MarkovModel.java that can build a k -th order Markov model from text stored in a given file. (Eventually, you will build two objects using this class, one for Bush and one...

Answered: 1 week ago

Question

★★★★★

How do I get accounts receivables and sales based on this information?. The Polaris Company uses a job-onder costing system. The following transactions occurred in October. a Raw materials purchased...

Answered: 1 week ago

Question

★★★★★

A manufacturing company has recently outsourced its entire operations processes. Moreover, the CEO has decided to start an e-commerce business model to find a new market to target consumers with...

Answered: 1 week ago

Question

★★★★★

Primary source analysis. There are three newspaper excerpts included in the Election of 1800 section of the lesson. For each source, identify if it was a Republican newspaper (in favor of the...

Answered: 1 week ago

Question

★★★★★

Discuss the ECCO Case. In your discussion, please make sure to address the following questions: What sets ECCO's supply chain apart from their competitors? How do you think that they can improve it...

Answered: 1 week ago

Question

★★★★★

Best Vision is revamping its assembly lines to improve efficiency. As shown below, there are 10 steps to assembling a television set. Best Vision needs to produce 125 televisions in a 40-hour work...

Answered: 1 week ago

Question

★★★★★

Colby Company reports net income of $50,000. For the year, depreciation expense is $15,000 and the company reports a gain of $5,000 from sale of machinery. It also had a $3,000 loss from retirement...

Answered: 1 week ago

Question

★★★★★

4-12 In MyMISLab, you will find a Collaboration and Teamwork Project dealing with the concepts in this chapter. You will be able to use Google Drive, Google Docs, Google Sites, Google+, or other open...

Answered: 1 week ago

Question

★★★★★

4-10 In this project, you will learn how to build a simple blog. Using the blog creation software at Blogger.com, follow the instructions to create a blog. Give your blog a suitable name; choose a...

Answered: 1 week ago

Question

★★★★★

4-13 Perform an ethical analysis of Facebook. What is the ethical dilemma presented by this case? Over the course of less than a decade, Facebook has morphed from a small, niche networking site for...

Answered: 1 week ago

Previous Question Next Question