Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 30, 2024

You will learn to build Hidden Markov Model using the Viterbi algorithm and apply it to the task of POS tagging. Complete each of the

You will learn to build Hidden Markov Model using the Viterbi algorithm and apply it to the

task of POS tagging. Complete each of the following tasks.

Load NLTK Treebank tagged sentences using nltk

.

corpus.treebank.tagged

_

sents

() .

Use first

80 %

of sentences for training and the remaining

20 %

for testing.

Extract the word and the tag from each of the sentences and create a vocabulary of all

the words and a set of all tags.

To implement the Viterbi algorithm, you need

2

components,

Tag transition probability matrix A: It represents the probability of a tag

occurring given the previous tag or

p (t_{i} | t_{i - 1}) .

We compute the maximum likelihood

estimate

(

MLE

)

of the probability by counting the occurrences of the tag

t_{i - 1}

followed

t a g t_{i} .

p (t_{i} | t_{i - 1}) = \frac{c o u n t (t_{i - 1}, t_{i})}{c o u n t (t_{i - 1})}

Emission probability matrix B: It represents the probability of a tag

t_{i}

being

associated with a given word

w_{i}

p (w_{i} | t_{i}) .

MLE estimate is:

p (w_{i} | t_{i}) = \frac{c o u n t (t_{i}, w_{i})}{c o u n t (t_{i})}

Since the number of tags is smaller, creating matrix

A

is time efficient whereas generation

of matrix B will be very expensive due to vocabulary size.

Implement a method compute

_

tag

_

trans

_

probs

()

to calculate matrix

A

by parsing the

sentences in the training set and counting the occurrences of the tag

t_{i - 1}

followed by

t_{i} .

Implement a method emission

_

probs

()

to calculate emission probability of a given word

w_{i}

having a tag

t_{i} .

Next step in HMM is decoding which entails determining the hidden variable sequence of

observations. In POS tagging, decoding is to choose the sequence of tags most probable

to the sequence of words. We compute this using the following equation,

hat

(t)_{1 : n} = a r g m a x_{t_{1} d o t s t_{n}} p r o d_{i} = 1^{n} p (w_{i} | t_{i}) p (t_{i} | t_{i - 1})

The optimal solution for HMM decoding is given by the Viterbi algorithm, a dynamic

approach to the computation of the decoded tags. Implement the algorithm using the

two methods, compute

_

tag

_

trans

_

probs

()

and emission

_

probs

()

implemented above

and return the sequence of tags corresponding to the given sequence of words. Refer to

section

8.4.5,

Fig.

8.10

of Speech and Language Processing book

?^{5} .

Evaluate the performance of the model in terms of accuracy on the test set.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David M Kroenke, David J Auer

6th Edition

0132742926, 978-0132742924

Students also viewed these Databases questions

Question

★★★★★

A binary PAM wave is to be transmitted over a baseband channel with an absolute maximum bandwidth of 75 kHz. The bit duration is 10s. Find a raised-cosine spectrum that satisfies these requirements.

Answered: 1 week ago

Question

★★★★★

6 . 7 . Write out a simplified Boolean expression and construct a truth table for each of the following circuits. Assume 0 = low = 0 V and 1 = high = 5 V .

Answered: 1 week ago

Question

★★★★★

Describe the two main constants.

Answered: 1 week ago

Question

★★★★★

Ginny Rait is the general manager for N.R.G., Inc., a company producing two types of electric generators - the BR54 and the BR49. Orders have been received, and a production schedule is to be set up...

Answered: 1 week ago

Question

★★★★★

Project S requires an initial outlay at t = 0 of $10,000, and its expected cash flows would be $6,500 per year for 5 years. Mutually exclusive Project L requires an initial outlay at t = 0 of...

Answered: 1 week ago

Question

★★★★★

The ability of a leader to inspire change in their followers is one of the transformational leadership strengths. This type of leader is usually enthusiastic about their vision and mission and can...

Answered: 1 week ago

Question

★★★★★

If the circuit in Figure P32.8 operates at $60 \mathrm{~Hz}$ with $\mathscr{E}_{\max }=170 \mathrm{~V}$ and $R=9.0 \Omega$, how much energy is dissipated in the resistor in $0.75 \mathrm{~s}$...

Answered: 1 week ago

Question

★★★★★

In an $L C$ circuit like the one shown in Figure P32.1, which of these quantities simultaneously reach their maximum values: \(\left|v_{C} ight|,\left|q_{C} ight|,|i|,|B|,\left|U^{E}...

Answered: 1 week ago

Question

★★★★★

Two circuits $\mathrm{X}$ and $\mathrm{Y}$ each contain a parallel-plate capacitor and may contain other elements as well. The capacitors are identical, and the potential difference across each...

Answered: 1 week ago

Question

★★★★★

Draw a phasor diagram at instant $t_{0}$ for the circuit element whose instantaneous current and instantaneous potential difference are shown in Figure P32.11. Data from Figure P32.11 MAN 0.5 1.0...

Answered: 1 week ago

Question

★★★★★

Figure P32.16 shows, for a circuit consisting of one element and an $\mathrm{AC}$ source, the current through the element as a function of time and the potential difference across the element. Is...

Answered: 1 week ago

Previous Question Next Question