Question
Please use Python NLTK. Use screenshots please. Thanks in advance. Objective : Use n-gram mo dels for text analysis . Turn in: your Python programs,
Please use Python NLTK.
Use screenshots please.
Thanks in advance.
Objective: Use n-gram models for text analysis.
Turn in: your Python programs, zipped (just your 2 programs)
In this homework you will create bigram and unigram dictionaries for English, French, and Italian using the provided training data where the key is the unigram or bigram text and the value is the count of that unigram or bigram in the data. Then for the test data, calculate probabilities for each language and compare against the true labels.
Instructions:
|
|
|
|
|
|
|
|
|
|
HINT FOR PART 1 and 2:
Creating the dictionaries in Program 1:
You can use the NLTK ngrams() function to create a bigrams and a unigrams generator object. Then you can iterate over each to create the dictionary using Pythons .count() string method to extract counts from the text you read in.
Calculating probabilities in Program 2:
The probabilities will be large enough so that you dont need to use logs, we will simply multiply the probabilities together. Each bigrams probability with Laplace smoothing is: (b + 1) / (u + v) where b is the bigram count, u is the unigram count of the first word in the bigram, and v is the total vocabulary size (add the lengths of the 3 unigram dictionaries).
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started