Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Here, you will explore n-grams, a key concept in machine processing of text (e.g automated translation). Ann-gram of words is a group of n consecutive
Here, you will explore n-grams, a key concept in machine processing of text (e.g automated translation). Ann-gram of words is a group of n consecutive words. So the line: Cancdy is very yummy! Has three 2-grams, canay is , is very- and very yummy , two 3-grams. cancy is very\" and \"is very yummy\", and one 4-gram: \"candy is very yummy In this problem, you will need to create a function called topngram which takes two parameters, the first a string which is the name of a file, the second a value of n telling the size of the n-gram. Your code should return the string that is the most common n gram of the specified size Your code should do the following Ignore the case of any letters (After\" and \"after\" should count as the same 1-gram) Ignore punctuation and numbers(STOP!\" and \"stop\" should count as the same 1-gram) Ignore line breaks HellonThere?\" and \"hello, There!\" are the same 2-gram) Ignore the following common words: of, the, i, he, she, a, it, the, is, was, be, not, my For the text of Hawthorne's The Scarlet Letter the most common 2-gram is >>> topngram (\"scarlet.txt\", 2) hester prynne' For Robert E. Howard's Conan the Barbarian, the most common 2-gram is >>> topngram (\"conan.txt\", 2) 'his sword' For the text of William Shakespeare's Macbeth the most common 3-gram is >>>topngram (\"macbeth.txt\",3) enter lady macbeth For the text of Edgar Allan Poe's The Raven, the most common 3-gram is >>>topngram (\"raven.txt\",3) and nothing more For the text of Charles Dicken's A Christmas Carol, the most common 4-gram is >>>topngram (\"christmas.txt\", 4) 'good afternoon said scrooge' Hints and tips -You will probably find Python's dictionaries useful here First get the programming working with 1-grams (single words) Even though you will likely read the file in line-by-line, you still want to treat breaks in between lines just like a space between words -Try to break the problem down into sub-problems (e.g., remove punctuation, merge lines, build a dictionary, find most common). Code and test each sub-problem separately
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started