Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Only using the below imports, finish this question from tmtoolkit.corpus import Corpus, lemmatize, to _ lowercase, remove _ chars, filter _ clean _ tokens from

Only using the below imports, finish this question
from tmtoolkit.corpus import Corpus, lemmatize, to_lowercase, remove_chars, filter_clean_tokens
from tmtoolkit.corpus import corpus_num_tokens, corpus_tokens_flattened
from tmtoolkit.corpus import dtm
from tmtoolkit.corpus import vocabulary
from tmtoolkit.topicmod.model_io import print_ldamodel_topic_words
from tmtoolkit.topicmod.tm_lda import compute_models_parallel
from string import punctuation
def build_corpus(texts, lang="en"):
"""Corpus builder which returns a Corpus object processed on texts as language
specified by lang (defaults to "en"):
Should perform all of the following pre-processing functions:
- Lemmatize the tokens
- Convert tokens to lowercase
- Remove punctuation
- Remove numbers
- Remove tokens shorter than 2 characters
"""
# Here, we just use the index of the text as the label for the corpus item
corpus = Corpus({ i:r for i, r in enumerate(texts)}, language=lang)
# TODO: Complete the implementation of this function and submit the
# .py download of this notebook as your assignment submission.
Use this for testing:
example_docs =[ # Feel free to edit this corpus for further testing
# to be sure that your functions meet specifications.
"The 3 cats sat on the mats!",
"1 fish 2 fish Red fish Blue fish",
"She sells $ea$shells"
]
example_corpus = build_corpus(example_docs)
corpus_tokens_flattened(example_corpus)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

OpenStack Trove

Authors: Amrith Kumar, Douglas Shelley

1st Edition

1484212215, 9781484212219

More Books

Students also viewed these Databases questions

Question

Use Demorgan's law to write negations for the

Answered: 1 week ago

Question

What would you do?

Answered: 1 week ago