Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

We are finally ready to put all the pieces together! We can now measure documents, train our classifier, and score documents per language. Write a

We are finally ready to put all the pieces together! We can now measure documents, train our classifier, and score documents per language. Write a function classify_doc(document,lang_counts=default_lang_counts) which takes a string document and a dictionary of normalised lang_counts, and returns a language based on the score of each language.

As before, we have provided a hidden implementation of score_document(document, lang_counts) in a hidden module (already imported) which takes a document and returns a dictionary of scores per language, as in the previous question. We have also provided a number of documents to play with.

Your function should return the language with the highest score. In the event of a tie it should return 'English' since the most common document in the training set is written in English, suggesting that if the document comes from the same source (Wikipedia), it is probably written in English. Obviously not a perfect assumption, but better than nothing given no information.

But how do we determine a tie? If the two top-ranking scores lie within 1e-10 of one another, then we shall say it's a tie (why do we do this, rather than testing equality directly?).

Your function should behave as follows:

>>> s = open('en_163083.txt').read()

>>> classify_doc(s)

'English'

>>> classify_doc('asdfhlj')

'Icelandic'

>>> s = open('pl_188313.txt').read()

>>> classify_doc(s)

'Polish'

>>> classify_doc('Hello Bob')

'Italian'

How to code this using python?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Database Technology Edbt 94 4th International Conference On Extending Database Technology Cambridge United Kingdom March 1994 Proceedings Lncs 779

Advances In Database Technology Edbt 94 4th International Conference On Extending Database Technology Cambridge United Kingdom March 1994 Proceedings Lncs 779

Authors: Matthias Jarke ,Janis Bubenko ,Keith Jeffery

1994th Edition

3540578188, 978-3540578185

More Books

Students also viewed these Databases questions

Question

★★★★★

What is the pH of a solution containing a buffer consisting of acetic acid and sodium acetate in which the actual [acetic acid]/[sodium acetate] ratio is (a) 1/3? (b) 3? (c) 1?

Answered: 1 week ago

Question

★★★★★

Explain why and how Mohawk transformed their business from making paper to making connections. Refer to their use of technology and the Vision to Implementation components (strategy ?to architecture...

Answered: 1 week ago

Question

★★★★★

Has anyone ever labeled you in a way that truly irritated or offended you? What terms did they use? Are you aware of any biased language that frequently seeps into conversations among your friends,...

Answered: 1 week ago

Question

★★★★★

Selected ledger accounts for Rolm Company are given below for the just completed year: Required: 1. What was the cost of raw materials put into production during the year? 2. How much of the...

Answered: 1 week ago

Question

★★★★★

We are finally ready to put all the pieces together! We can now measure documents, train our classifier, and score documents per language. Write a function...

Answered: 1 week ago

Question

★★★★★

1. (13 points) Think about the full object-oriented version of the CORE interpreter that we've discussed in class. Suppose we wanted to use polymorphism in the implementation of the Stmt class. That...

Answered: 1 week ago

Question

★★★★★

D Match the type of tourism with its definition. Ecotourism Cultural Heritage Nature Culinary Volunteer Question 20 [Choose ] [Choose ] traveling to Italy to experince regional cuisines enjoying...

Answered: 1 week ago

Question

★★★★★

Question: Your father is 50 years old and will retire in 10 years. He expects to live for 25 years after he retires, until he is 85. He wants a fixed retirement income that has the same purchasing...

Answered: 1 week ago

Question

★★★★★

Equivalent Units of Production and Related Costs The charges to Work in Process-Baking Department for a period as well as information concerning production are as follows. The Baking Department uses...

Answered: 1 week ago

Question

★★★★★

Google is one of the most popular web search engines freely available on the internet... a . TRUE. b . FALSE.

Answered: 1 week ago

Question

★★★★★

Applied vs. Actual Manufacturing Overhead Davis Manufacturing Corporation applies manufacturing overhead on the basis of 140% of direct labor cost. An analysis of the related accounts and job order...

Answered: 1 week ago

Question

★★★★★

The manager must actively communicate with the job sharers and accept the fact that they might not be immediately available for consultation.

Answered: 1 week ago

Question

★★★★★

A job-sharing agreement should be written that clearly spells out performance expectations, work schedules for each employee, and any other management concerns.

Answered: 1 week ago

Question

★★★★★

Meeting schedules, work assignments, and vacation schedules need to be carefully coordinated. Job sharers should plan overlap times during which to meet and communicate.

Answered: 1 week ago

Previous Question Next Question