Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python code below. it is not displaying all the requirements: Remove all the punctuations and non-English words, then count the number of the rest

Python code below. it is not displaying all the requirements: 

 

  1. Remove all the punctuations and non-English words, then count the number of the rest of the words in the file
  2. Using the words after step 1 to build a word dictionary, all the words in the dictionary are unique (e.g. the word "But" and "but" should be considered as the same word)
    • Count the number of distinct words in your dictionary
    • The words in the dictionary should be displayed in an alphabetic order
  3. Select three sentences from the file, then use any POS tagging tools to identify POS tags in the selected sentences.

 

Code:

import string

import nltk

from collections import OrderedDict


 

# Download necessary NLTK data

nltk.download('averaged_perceptron_tagger')

nltk.download('words')


 

# Define function to check if a word is English

english_vocab = set(w.lower() for w in nltk.corpus.words.words())

def is_english(word):

    return word.lower() in english_vocab


 

try:

    with open(r'file_path', 'r') as f:

        text = f.read()


 

    # Preprocess: Remove punctuation and non-English words

    exclude = set(string.punctuation)

    text = ''.join(ch for ch in text if ch not in exclude and ch.isascii())

    words = text.split()

    words = [word for word in words if is_english(word)]


 

    # Count processed words and print

    total_processed_words = len(words)

    print(f"Total processed words: {total_processed_words}")


 

    # Build dictionary of unique words

    word_dict = OrderedDict()

    for word in words:

        word_lower = word.lower()

        if word_lower not in word_dict:

            word_dict[word_lower] = 1

        else:

            word_dict[word_lower] += 1


 

    # Count distinct words and print

    distinct_word_count = len(word_dict)

    print(f"Number of distinct words: {distinct_word_count}")


 

    # Print words in alphabetical order

    print("\nWords in alphabetical order:")

    for word in sorted(word_dict):

        print(word)


 

    # Select sentences and POS tag

    # Replace these sentences with ones from your file if necessary

    sentences = [

        "from fairest creatures we desire increase that thereby beautys rose might never die",

        "when forty winters shall besiege thy brow and dig deep trenches in thy beautys field",

        "for where is she so fair whose uneared womb disdains the tillage of thy husbandry"

    ]


 

    for sentence in sentences:

        pos_tags = nltk.pos_tag(sentence.split())

        print("\nSentence:", sentence)

        print("POS Tags:", pos_tags)


 

except Exception as e:

    print("An error occurred:", e)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Income Tax Fundamentals 2013

Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill

31st Edition

1111972516, 978-1285586618, 1285586611, 978-1285613109, 978-1111972516

More Books

Students also viewed these Algorithms questions

Question

-9 + (-5) Find the sum by hand.

Answered: 1 week ago