Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 1. (Markov Model Data Type) Create a data type MarkovModel in markov_model.py to represent a Markov model of order k from a given text

Problem 1. (Markov Model Data Type) Create a data type MarkovModel in markov_model.py to represent a Markov model of order k from a given text string. The data type must implement the following API:

image text in transcribed

image text in transcribedConstructor To implement the data type, define two attributes: an integer _k that stores the order of the Markov model, and a dictionary1 _st whose keys are all the k-grams from the given text. The value corresponding to each key (say kgram) in _st is a dictionary whose keys are the characters that follow kgram in the text, and the corresponding values are their frequencies. You may assume that the input text is a sequence of characters over the ASCII alphabet so that all values are between 0 and 127. The frequencies should be tallied as if the text were circular (i.e., as if it repeated the first k characters at the end).example, if the text is gagggagaggcgagaaa and k = 2, then the dictionary st should look like the following:

{aa : {a : 1 , g : 1} ,

ag : {a : 3 , g : 2} ,

cg : {a : 1} , ga : {a : 1 , g : 4} ,

gc : {g : 1} ,

gg : {a : 1 , c : 1, g : 1}}

If you are careful enough, the entire dictionary can be built in just one pass through the circular text. Note that there is no reason to save the original text or the circular text as an attribute of the data type. That would be a grossly inefficient waste of space. Your MarkovModel object does not need either of these strings after the dictionary is built.

Order. Return the order k of the Markov Model.

Frequency. There are two frequency methods. kgram_freq(kgram) returns the number of times kgram was found in the original text. Returns 0 when kgram is not found. Raises an error if kgram is not of length k. char_freq(kgram, c) returns the number of times kgram was followed by the character c in the original text. Returns 0 when kgram or c is not found. Raises an error if kgram is not of length k.

Randomly generate a character. Return a character. It must be a character that followed the kgram in the original text. The character should be chosen randomly, but the results of calling rand(kgram) several times should mirror the frequencies of characters that followed the kgram in the original text. Raise an error if kgram is not of length k or if kgram is unknown. Generate pseudo-random text. Return a string of length T that is a randomly generated stream of characters whose first k characters are the argument kgram. Starting with the argument kgram, repeatedly call rand() to generate the next character. Successive k-grams should be formed by using the most recent k characters in the newly generated text. To avoid dead ends, treat the input text as a circular string: the last character is considered to precede the first character. For example, if k = 2 and the text is the 17-character string gagggagaggcgagaaa, then the salient features of the Markov model are captured in the table below:

image text in transcribed

Note that the frequency of ag is 5 (and not 4) because we are treating the string as circular.

A Markov chain is a stochastic process where the state change depends on only the current state. For text generation, the current state is a k-gram. The next character is selected at random, using the probabilities from the Markov model. For example, if the current state is ga in the Markov model of order 2 discussed above, then the next character is a with probability 1/5 and g with probability 4/5. The next state in the Markov chain is obtained by appending the new character to the end of the k-gram and discarding the first character. A trajectory through the Markov chain is a sequence of such states. Shown below is a possible trajectory consisting of 9 transitions.

image text in transcribed

To generate random text from a Markov model of order k, set the initial state to k characters from the input text. Then, simulate a trajectory through the Markov chain by performing T ? k transitions, appending the random character selected at each step. For example, if k = 2 and T = 11, the following is a possible trajectory leading to the output gaggcgagaag:

image text in transcribed

image text in transcribed

DIRECTIONS

""" markov_model.py

A data type that represents a Markov model of order k from a given text string. """

import stdio import stdrandom import sys

class MarkovModel(object): """ Represents a Markov model of order k from a given text string. """

def __init__(self, text, k): """ Creates a Markov model of order k from given text. Assumes that text has length at least k. """

self._k = k self._st = {} circ_text = text + text[:k] for i in range(len(circ_text) - k): ...

def order(self): """ Returns order k of Markov model. """

...

def kgram_freq(self, kgram): """ Returns number of occurrences of kgram in text. Raises an error if kgram is not of length k. """

if self._k != len(kgram): raise ValueError('kgram ' + kgram + ' not of length ' + str(self._k)) ...

def char_freq(self, kgram, c): """ Returns number of times character c follows kgram. Raises an error if kgram is not of length k. """

if self._k != len(kgram): raise ValueError('kgram ' + kgram + ' not of length ' + str(self._k)) ...

def rand(self, kgram): """ Returns a random character following kgram. Raises an error if kgram is not of length k or if kgram is unknown. """

if self._k != len(kgram): raise ValueError('kgram ' + kgram + ' not of length ' + str(self._k)) if kgram not in self._st: raise ValueError('Unknown kgram ' + kgram) ...

def gen(self, kgram, T): """ Generates and returns a string of length T by simulating a trajectory through the correspondng Markov chain. The first k characters of the generated string is the argument kgram. Assumes that T is at least k. """

...

def replace_unknown(self, corrupted): """ Replaces unknown characters (~) in corrupted with most probable characters, and returns that string. """

# Given a list a, argmax returns the index of the maximum element in a. def argmax(a): return a.index(max(a))

original = '' for i in range(len(corrupted)): if corrupted[i] == '~': ... else: original += corrupted[i] return original

def _main(): """ Test client [DO NOT EDIT]. """

text, k = sys.argv[1], int(sys.argv[2]) model = MarkovModel(text, k) a = [] while not stdio.isEmpty(): kgram = stdio.readString() char = stdio.readString() a.append((kgram.replace("-", " "), char.replace("-", " "))) for kgram, char in a: if char == ' ': stdio.writef('freq(%s) = %s ', kgram, model.kgram_freq(kgram)) else: stdio.writef('freq(%s, %s) = %s ', kgram, char, model.char_freq(kgram, char))

if __name__ == '__main__': _main()

My answer, but it will not take def char_freq(self, kgram, c):

""" markov_model.py

A data type that represents a Markov model of order k from a given text string. """

import stdio import stdrandom import sys

class MarkovModel(object): """ Represents a Markov model of order k from a given text string. """

def __init__(self, text, k): """ Creates a Markov model of order k from given text. Assumes that text has length at least k. """

self._k = k self._st = {} circ_text = text + text[:k] for i in range(len(circ_text) - k): tt = circ_text[i:i + k] tp = circ_text[i + k:i + k + 1] if (tt in self._st) == True: if (tp in self._st) == True: self._st[tt][tp] = 1 + self._st[tt][tp] else: self._st[tt][tp] = 1 else: self._st[tt] = {tp: 1}

def order(self): """ Returns order k of Markov model. """

return self._k

def kgram_freq(self, kgram): """ Returns number of occurrences of kgram in text. Raises an error if kgram is not of length k. """

if self._k != len(kgram): raise ValueError('kgram ' + kgram + ' not of length ' + str(self._k)) tk = (self._st).setdefault (kgram, 0) if tk == 0: return 0 else: tpp = sum((self._st[kgram]).values()) return tpp def char_freq(self, kgram, c): """ Returns number of times character c follows kgram. Raises an error if kgram is not of length k. """

if self._k != len(kgram): raise ValueError('kgram ' + kgram + ' not of length ' + str(self._k)) tk = (self._st).setdefault (kgram, 0) if tk == 0: return 0 else: tpp = self._st[kgram][c] return tpp

def rand(self, kgram): """ Returns a random character following kgram. Raises an error if kgram is not of length k or if kgram is unknown. """

if self._k != len(kgram): raise ValueError('kgram ' + kgram + ' not of length ' + str(self._k)) if kgram not in self._st: raise ValueError('Unknown kgram ' + kgram) rd = stdrandom.discrete((self._st[kgram]).values()) itms = (self._st[kgram]).items() return itms[rd][0]

def gen(self, kgram, T): """ Generates and returns a string of length T by simulating a trajectory through the correspondng Markov chain. The first k characters of the generated string is the argument kgram. Assumes that T is at least k. """

tpp = kgram tt = kgram for ii in range(0, T): tpp = tpp + markov_model.rand(self, tt) if len(tpp) == T: return tpp

def replace_unknown(self, corrupted): """ Replaces unknown characters (~) in corrupted with most probable characters, and returns that string. """

# Given a list a, argmax returns the index of the maximum element in a. def argmax(a): return a.index(max(a))

original = '' for i in range(len(corrupted)): if corrupted[i] == '~': pass else: original += corrupted[i] return original

def _main(): """ Test client [DO NOT EDIT]. """

text, k = sys.argv[1], int(sys.argv[2]) model = MarkovModel(text, k) a = [] while not stdio.isEmpty(): kgram = stdio.readString() char = stdio.readString() a.append((kgram.replace("-", " "), char.replace("-", " "))) for kgram, char in a: if char == ' ': stdio.writef('freq(%s) = %s ', kgram, model.kgram_freq(kgram)) else: stdio.writef('freq(%s, %s) = %s ', kgram, char, model.char_freq(kgram, char))

if __name__ == '__main__': _main()

method MarkovModel (text, k) model.order O model.kgram_freq(kgram) model.char.treqCkgram, e) model.rand (kgram) create a Markov model nodel of order k from tert order k of Markov model number of occurrences of kgram in text number of times that character c follows kgram a random character following the given kgram a string of length T characters generated by simulating a trajectory through the corresponding Markov chain, the first k characters of which is kgram model.gen(kgram, T) method MarkovModel (text, k) model.order O model.kgram_freq(kgram) model.char.treqCkgram, e) model.rand (kgram) create a Markov model nodel of order k from tert order k of Markov model number of occurrences of kgram in text number of times that character c follows kgram a random character following the given kgram a string of length T characters generated by simulating a trajectory through the corresponding Markov chain, the first k characters of which is kgram model.gen(kgram, T)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions