Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 03, 2024

Python code below. it is not displaying all the requirements: Remove all the punctuations and non-English words, then count the number of the rest

Python code below. it is not displaying all the requirements:

Remove all the punctuations and non-English words, then count the number of the rest of the words in the file
Using the words after step 1 to build a word dictionary, all the words in the dictionary are unique (e.g. the word "But" and "but" should be considered as the same word)
- Count the number of distinct words in your dictionary
- The words in the dictionary should be displayed in an alphabetic order
Select three sentences from the file, then use any POS tagging tools to identify POS tags in the selected sentences.

Code:

import string

import nltk

from collections import OrderedDict

# Download necessary NLTK data

nltk.download('averaged_perceptron_tagger')

nltk.download('words')

# Define function to check if a word is English

english_vocab = set(w.lower() for w in nltk.corpus.words.words())

def is_english(word):

return word.lower() in english_vocab

try:

with open(r'file_path', 'r') as f:

text = f.read()

# Preprocess: Remove punctuation and non-English words

exclude = set(string.punctuation)

text = ''.join(ch for ch in text if ch not in exclude and ch.isascii())

words = text.split()

words = [word for word in words if is_english(word)]

# Count processed words and print

total_processed_words = len(words)

print(f"Total processed words: {total_processed_words}")

# Build dictionary of unique words

word_dict = OrderedDict()

for word in words:

word_lower = word.lower()

if word_lower not in word_dict:

word_dict[word_lower] = 1

else:

word_dict[word_lower] += 1

# Count distinct words and print

distinct_word_count = len(word_dict)

print(f"Number of distinct words: {distinct_word_count}")

# Print words in alphabetical order

print("\nWords in alphabetical order:")

for word in sorted(word_dict):

print(word)

# Select sentences and POS tag

# Replace these sentences with ones from your file if necessary

sentences = [

"from fairest creatures we desire increase that thereby beautys rose might never die",

"when forty winters shall besiege thy brow and dig deep trenches in thy beautys field",

"for where is she so fair whose uneared womb disdains the tillage of thy husbandry"

]

for sentence in sentences:

pos_tags = nltk.pos_tag(sentence.split())

print("\nSentence:", sentence)

print("POS Tags:", pos_tags)

except Exception as e:

print("An error occurred:", e)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Income Tax Fundamentals 2013

Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill

31st Edition

1111972516, 978-1285586618, 1285586611, 978-1285613109, 978-1111972516

More Books

Students also viewed these Algorithms questions

Question

★★★★★

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Answered: 1 week ago

Question

★★★★★

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Answered: 1 week ago

Question

★★★★★

CANMNMM January of this year. (a) Each item will be held in a record. Describe all the data structures that must refer to these records to implement the required functionality. Describe all the...

Answered: 1 week ago

Question

★★★★★

An important U.S. government organization charged with setting human resource management guidelines is O the EEOC (Equal Employment Opportunity Commission). the OSHA (Occupational Safety and Health...

Answered: 1 week ago

Question

★★★★★

-9 + (-5) Find the sum by hand.

Answered: 1 week ago

Question

★★★★★

Suppose that a steel of eutectoid composition is cooled to 550C (1020F) from 760C (1400F) in less than 0.5 s and held at this temperature. (a) How long will it take for the austenite-to-pearlite...

Answered: 1 week ago

Question

★★★★★

Each round played by a contestant is either a success with probability p or a failure with probability 1 p. If the round is a success, then a random amount of money having an exponential distribution...

Answered: 1 week ago

Question

★★★★★

At the beginning of the summer, Jack Wells was looking for a way to earn money to pay for his college tuition in the fall. He decided to start a lawn service business in his neighborhood. To get the...

Answered: 1 week ago

Question

★★★★★

The Jonas Corporation uses a process system. During the current period, 2, 500 units were started and 1, 100 units were completed and transferred out. Ending units were 60% complete for materials and...

Answered: 1 week ago

Question

★★★★★

Homestead Crafts, a distributor of handmade gifts, operates out of owner Emma Finn's house. At the end of the current period, Emma looks over her inventory and finds that she has 1,300 units...

Answered: 1 week ago

Question

★★★★★

Accounting for Gift Cards Assume Ikeo Inc. sold $160,000 of gift cards during the last two weeks of December 2020. No gift cards were redeemed in 2020, while $144,000 of the gift cards were redeemed...

Answered: 1 week ago

Question

★★★★★

Continue Mini Case 1 (the Accounting Cycle Part 1), the following information is available for FastForward in September that may need adjustments.1 1. By the end of September, FastForward has...

Answered: 1 week ago

Question

★★★★★

A water molecule is shown below. Both hydrogen atoms are the same distance from the oxygen atom and are in the x-y plane. Choose the origin to be the center of the oxygen atom, the + x direction to...

Answered: 1 week ago

Question

★★★★★

Leon, age 45, is an active participant in his employer's defined benefit retirement plan, but he would also like to make a deductible contribution to a traditional IRA this year. Leon is married,...

Answered: 1 week ago

Question

★★★★★

Practice Problems 4. If the projection Pa and component Fb of the force F along oblique axes a and b are both 325 N, determine the magnitude F and the orientation 0 of the b-axis. F=424 N.0 17.95 75%...

Answered: 1 week ago

Question

★★★★★

(5 pts) Provide a complete, detailed mechanism that shows how the -ketoester is formed in the following sequence of reactions. Include all intermediate structures and all important resonance...

Answered: 1 week ago

Question

★★★★★

Find a least expensive route, in monthly lease charges, between the pairs of computer centers in Exercise 11 using the lease charges given in Figure 2. a) Boston and Los Angeles b) New York and San...

Answered: 1 week ago

Question

★★★★★

The following additional information is available for the Dr. Ivan and Irene Incisor family. Ivan and Irene have the following investment income, in addition to that reported in Chapter 1: Dividends...

Answered: 1 week ago

Question

★★★★★

Phil and Linda are 25-year-old newlyweds and file a joint tax return. Linda is covered by a retirement plan at work, but Phil is not. a. Assuming Phil's wages were $27,000 and Linda's wages were...

Answered: 1 week ago

Question

★★★★★

Matthew borrows $250,000 to invest in bonds. During 2012, his interest on the loan is $30,000. Matthew's interest income from the bonds is $10,000. This is Matthew's only investment income. a....

Answered: 1 week ago

Question

★★★★★

Where do most of the data values fall? What is a typical value for the data set? What does this say about the variable being summarized?

Answered: 1 week ago

Question

★★★★★

Some days of the week are more dangerous than others, according to Traffic Safety Facts produced by the National Highway Traffic Safety Administration. The average number of fatalities per day for...

Answered: 1 week ago

Question

★★★★★

Figure EX-3.47 is from the Fall 2008 Census Enrollment Report at Cal Poly, San Luis Obispo. It uses both a pie chart and a segmented bar graph to summarize data on ethnicity for students enrolled at...

Answered: 1 week ago

Previous Question Next Question