Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

We are finally ready to put all the pieces together! We can now measure documents, train our classifier, and score documents per language. Write a

We are finally ready to put all the pieces together! We can now measure documents, train our classifier, and score documents per language. Write a function classify_doc(document,lang_counts=default_lang_counts) which takes a string document and a dictionary of normalised lang_counts, and returns a language based on the score of each language.

As before, we have provided a hidden implementation of score_document(document, lang_counts) in a hidden module (already imported) which takes a document and returns a dictionary of scores per language, as in the previous question. We have also provided a number of documents to play with.

Your function should return the language with the highest score. In the event of a tie it should return 'English' since the most common document in the training set is written in English, suggesting that if the document comes from the same source (Wikipedia), it is probably written in English. Obviously not a perfect assumption, but better than nothing given no information.

But how do we determine a tie? If the two top-ranking scores lie within 1e-10 of one another, then we shall say it's a tie (why do we do this, rather than testing equality directly?).

Your function should behave as follows:

>>> s = open('en_163083.txt').read() 
>>> classify_doc(s) 
'English' 
>>> classify_doc('asdfhlj') 
'Icelandic' 
>>> s = open('pl_188313.txt').read() 
>>> classify_doc(s) 
'Polish' 
>>> classify_doc('Hello Bob') 
'Italian' 

How to code this using python?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions