Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Here, you will explore n-grams, a key concept in machine processing of text (e.g automated translation). Ann-gram of words is a group of n consecutive

Here, you will explore n-grams, a key concept in machine processing of text (e.g automated translation). Ann-gram of words is a group of n consecutive words. So the line: Cancdy is very yummy! Has three 2-grams, canay is , is very- and very yummy , two 3-grams. cancy is very\" and \"is very yummy\", and one 4-gram: \"candy is very yummy In this problem, you will need to create a function called topngram which takes two parameters, the first a string which is the name of a file, the second a value of n telling the size of the n-gram. Your code should return the string that is the most common n gram of the specified size Your code should do the following Ignore the case of any letters (After\" and \"after\" should count as the same 1-gram) Ignore punctuation and numbers(STOP!\" and \"stop\" should count as the same 1-gram) Ignore line breaks HellonThere?\" and \"hello, There!\" are the same 2-gram) Ignore the following common words: of, the, i, he, she, a, it, the, is, was, be, not, my For the text of Hawthorne's The Scarlet Letter the most common 2-gram is >>> topngram (\"scarlet.txt\", 2) hester prynne' For Robert E. Howard's Conan the Barbarian, the most common 2-gram is >>> topngram (\"conan.txt\", 2) 'his sword' For the text of William Shakespeare's Macbeth the most common 3-gram is >>>topngram (\"macbeth.txt\",3) enter lady macbeth For the text of Edgar Allan Poe's The Raven, the most common 3-gram is >>>topngram (\"raven.txt\",3) and nothing more For the text of Charles Dicken's A Christmas Carol, the most common 4-gram is >>>topngram (\"christmas.txt\", 4) 'good afternoon said scrooge' Hints and tips -You will probably find Python's dictionaries useful here First get the programming working with 1-grams (single words) Even though you will likely read the file in line-by-line, you still want to treat breaks in between lines just like a space between words -Try to break the problem down into sub-problems (e.g., remove punctuation, merge lines, build a dictionary, find most common). Code and test each sub-problem separately

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction to Wireless and Mobile Systems

Authors: Dharma P. Agrawal, Qing An Zeng

4th edition

1305087135, 978-1305087132, 9781305259621, 1305259629, 9781305537910 , 978-130508713

More Books

Students also viewed these Programming questions

Question

Explain why a safety net can save the life of a circus performer.

Answered: 1 week ago

Question

What are the seven economic growth facts?

Answered: 1 week ago

Question

Cite the characteristics of satisfying intimate relationships.

Answered: 1 week ago

Question

=+a. What is the value of the sample mean resonance frequency?

Answered: 1 week ago