Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 1 . ( Markov Model Data Type ) Define a data type called MarkovModel in markov _ model.py to represent a Markov model of

Problem 1.(Markov Model Data Type) Define a data type called MarkovModel in markov_model.py to represent a Markov model of order k from a given text string. The data type must support the following API:
2 MarkovModel
MarkovModel(text, k) constructs a Markov model m of order k from text m.order() returns the order of m m.kgram_freq(kgram) returns the number of occurrences of kgram in m m.char_freq(kgram, c) returns the number of times character c follows kgram in m m.rand(kgram) using m, finds and returns a random character following kgram m.gen(kgram, n) using m, builds and returns a string of length n, the first k characters of which is kgram
1/6
----------------------------------------------
aa 2 ag 5 cg 1 ga 5 gc 1 gg 3
101302100104001111
1/201/23/502/51001/504/50011/31/31/3
Project 6(Markov Model)
Constructor To implement the data type, define two instance variables: an integer _k that stores the order of the Markov model, and a symbol table _st whose keys are all the k-grams from the given text. The value corresponding to each key (say kgram) in _st is a symbol table whose keys are the characters that follow kgram in the text, and the corresponding values are their frequencies. You may assume that the input text is a sequence of characters over the ASCII alphabet so that all values are between 0 and 127. The frequencies should be tallied as if the text were circular (i.e., as if it repeated the first k characters at the end). For example, if the text is gagggagaggcgagaaa and k =2, then the symbol table _st should store the following information:
{
aa: {a: 1,g: 1},
ag: {a: 3,g: 2},
cg: {a: 1},
ga: {a: 1,g: 4},
gc: {g: 1},
gg: {a: 1,c: 1,g: 1}
}
If you are careful enough, the entire symbol table can be built in just one pass through the circular text. Note that there is no reason to save the original text or the circular text as an attribute of the data type. That would be a grossly inefficient waste of space. Your MarkovModel object does not need either of these strings after the symbol table is built.
Order. Return the order k of the Markov Model.
Frequency. There are two frequency methods.
kgram_freq(kgram) returns the number of times kgram was found in the original text. Returns 0 when kgram is not found. Raises an error if kgram is not of length k.
char_freq(kgram, c) returns the number of times kgram was followed by the character c in the original text. Returns 0 when kgram or c is not found. Raises an error if kgram is not of length k.
Randomly generate a character. Return a character. It must be a character that followed the kgram in the original text. The character should be chosen randomly, but the results of calling rand(kgram) several times should mirror the frequencies of characters that followed the kgram in the original text. Raise an error if kgram is not of length k or if kgram is unknown.
Generate pseudo-random text. Return a string of length n that is a randomly generated stream of characters whose first k characters are the argument kgram. Starting with the argument kgram, repeatedly call rand() to generate the next character. Successive k-grams should be formed by using the most recent k characters in the newly generated text.
To avoid dead ends, treat the input text as a circular string: the last character is considered to precede the first character. For example, if k =2 and the text is the 17-character string gagggagaggcgagaaa, then the salient features of the Markov model are captured in the table below:
frequency of next char kgram freq a c g
probability that next char is
a c g
----------------------------------------------17719
Note that the frequency of ag is 5(and not 4) because we are treating the string as circular.
A Markov chain is a stochastic process where the state change depends on only the current state. For text generation, the current state is a k-gram. The next character is selected at random, using the probabilities from the Markov model. For example, if the current state is ga in the Markov model of order 2 discussed above, then the next character is a with probability 1/5 and g with probability 4/5. The next state in the Markov chain is obtained by appending the new character to the end of the k-gram and discarding the first character. A trajectory through the Markov chain is a sequence of such states. Shown below is a possible trajectory consisting of 9 transitions.
2/6
Project 6(Markov Model)
trajectory: ga --> ag --> gg --> gc --> cg --> ga --> ag --> ga --> aa --> ag probability for a: 1/53/51/3011/53/51/51/2 probabilityforc: 001/3000000 probability for g: 4/52/51/3104/52/54/51/2
Treating the input text as a circular string ensures that the Markov chain never gets stuck in a state with no next characters.
To generate random text from a Markov model of order k, set the initial state to k characters from the input text. Then, sim

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions