Question
Write a function score_document(document,lang_counts=default_lang_counts) which takes as input a document name as a string and a dictionary of dictionaries containing normalised language counts called lang_counts.
Write a function score_document(document,lang_counts=default_lang_counts) which takes as input a document name as a string and a dictionary of dictionaries containing normalised language counts called lang_counts. It should return a dictionary of scores for each language in lang_counts, as obtained by performing a 'dot product' of trigram counts from the document with the normalised language counts. That is, it should multiply the trigram counts from the document with the trigram counts in lang_counts and add the whole lot up. If a trigram from the document is not in the dictionary for a given language, assume the count for the language as zero.
We have provided a stub of code which trains the classifier for you. We have also provided train_classifier(training_set) in a hidden library.
There are also two files included, visible in the tabs at top right. These are en_163083.txt, written in English, and de_1231811.txt, written in German, and can be loaded and used to test your function, which should behave as follows:
>>> test1 = 'en_163083.txt'
>>> d = score_document(test1)
>>> d['Vietnamese']
9.427325768357315
>>> max([(v, n) for (n, v) in d.items()])
(21.428216914833023, 'English')
>>> test2 = 'de_1231811.txt'
>>> d = score_document(test2)
>>> d['Polish']
7.710346556417009
>>> max([(v,n) for (n, v) in d.items()])
(53.12937809633241, 'German')
How to code this in python??
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started