Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Get to know the lab corpus (i.e. dataset) Prepare the data by make them all in lowercase to ease the string comparisons needed for subsequent

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed

Get to know the lab corpus (i.e. dataset) Prepare the data by make them all in lowercase to ease the string comparisons needed for subsequent steps \# In these two steps, we lower the letters in each word of the training and test corpus train_set = [[(pair[0]. lower(), pair[1])] for pairs in train_set for pair in pairs] test_set = [[ (pair[0]. lower(), pair[1])] for pairs in test_set for pair in pairs] \# create list of train and test tagged words train tagged words = [ tup for sent in train set for tup in sent ] test_tagged_words = [ tup for sent in test_set for tup in sent ] print(len(train_tagged_words)) print(len(test_tagged_words)) Write the transition probabilities function Build the transition probability matrix using the function written in step 4 Build the emission probability matrix using the function written in step 5 Extract all possible POS tagging for a test case 9. Compute the HMM probabilities Please complete the below tasks to predict the correct tag for a sentence. Your understanding of Part-1 is essential to complete these tasks. 1. Delete the start and end symbols from the corpus provided to you in the first part [0.25 mark] 2. Apply the emission and transition functions to compute related probabilities on the updated corpus [ 1 mark] 3. Compute the probability of the possible tags that can be given to the same test sentence provided in the first part but without the start and end symbols [1mark] 4. Report the difference [0.25] 5. Now, use the below code to install and prepare corpus from the library import nltk from sklearn.model selection import train test split import numpy as np import pandas as pd import random import pprint, time from itertools import product \#installing the treebank corpus from library nltk nltk.download('treebank') \# reading the Treebank tagged sentences nl data = list (nltk. corpus.treebank,tagged sents()) tr set, ts set =train test split (nl data [ : i00], train size=0.75, test size=0.25) 6. Apply the emission and transition functions on the corpus to compute related probabilities [1 mark] 7. Compute the probability of the possible tags using for the sentence: "In July, the agency imposed a ban"[1 mark] 8. Confirm your understanding with the lab instructor [0.5 mark]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

pls answer all questions thanks. bonding and Structure

Answered: 1 week ago