Question
The purpose of this assignment is to gain understanding of the Viterbi algorithm, and its application to part-of-speech (POS) tagging. You will also get to
The purpose of this assignment is to gain understanding of the Viterbi algorithm, and its application to part-of-speech (POS) tagging.
You will also get to see the Universal Dependencies treebanks. The main purpose of these treebanks is dependency parsing , but here we mainly use their part-of-speech tags.
You will develop a first-order HMM (Hidden Markov Model) for POS (part of speech) tagging in Python. This involves:
counting occurrences of one part of speech following another in a training corpus,
counting occurrences of words together with parts of speech in a training corpus,
relative frequency estimation with smoothing,
finding the best sequence of parts of speech for a list of words in the test corpus, according to a HMM model with smoothed probabilities,
computing the accuracy, that is, the percentage of parts of speech that is guessed correctly.
Run your application on the English (EWT) training and testing corpora. You should get an accuracy above 89%. If your accuracy is much lower, then you are probably doing something wrong.
Download English (EWT) from universaldependencies .org
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started