Question
Using the following Python code as an example, write a solution to a creating a Term Frequency program in JAVA with JAVA STREAMS instead: With
Using the following Python code as an example, write a solution to a creating a Term Frequency program in JAVA with JAVA STREAMS instead:
With Java stream API to create a code golf simple fewest lines of code version.
NOTE: Do NOT do character by character as in the Python example; just do WORD BY WORD!
Constraints:
1. Data comes to functions in streams, rather than as a complete whole all at at once
2. Functions are filters / transformers from one kind of data stream to another
3. Program must run on command line and take an input file of text called pride-and-prejudice.txt and must output only the TOP 25 most frequent words with their counts and MUST be in order of most frequent at the top and MUST output to a new text file called output.txt NOT the command line. It must FILTER out the STOP WORDS from the list below and take the stop_words.txt file as input (not a string of words hardcoded).
stop_words.txt:
a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your |
Correct output will look like this if written correctly so ENSURE THE STOP WORDS ARE PROPERLY REMOVED TO PROVIDE THE FOLLOWING OUTPUT BEFORE POSTING SOLUTION OR IT WILL BE DOWNVOTED!:
output.txt:
mr - 786
elizabeth - 635
very - 488
darcy - 418
such - 395
mrs - 343
much - 329
more - 327
bennet - 323
bingley - 306
jane - 295
miss - 283
one - 275
know - 239
before - 229
herself - 227
though - 226
well - 224
never - 220
sister - 218
soon - 216
think - 211
now - 209
time - 203
good - 201
**PYTHON CODE VERSION BELOW**:
import sys
import operator
import string
def characters(filename):
for line in open(filename):
for c in line:
yield c
def all_words(filename):
start_char = True
for c in characters(filename):
if start_char == True:
word = ""
if c.isalnum():
# We found the start of a word
word = c.lower()
start_char = False
else:
pass
else:
if c.isalnum():
word += c.lower()
else:
# We found end of word, emit it
start_char = True
yield word
def non_stop_words(filename):
stopwords = set(open(
'../stop_words.txt').read().strip(' ').split(',') + list(string.ascii_lowercase))
for w in all_words(filename):
if not w in stopwords:
yield w
def count_and_sort(filename):
freqs, i = {}, 1
for w in non_stop_words(filename):
freqs[w] = 1 if w not in freqs else freqs[w]+1
if i % 5000 == 0:
yield sorted(freqs.items(), key=operator.itemgetter(1), reverse=True)
i = i+1
yield sorted(freqs.items(), key=operator.itemgetter(1), reverse=True)
#
# The main function
#
for word_freqs in count_and_sort(sys.argv[1]):
print("-----------------------------")
for (w, c) in word_freqs[0:25]:
print(w, '-', c)
**THIS IS MY 7th ATTEMPT AT GETTING A WORKING SOLUTION FOR THIS QUESTION, SO MAKE SURE THE PROGRAM WILL FULLY WORK WITHOUT ADJUSTMENTS BEFORE SUBMITTING SOLUTION OR I WILL DOWNVOTE YOUR ANSWER.**
*** BE EXTRA CAREFUL WITH CODE FOR PARSING THE WORDS AND STOP WORDS TO ENSURE THE CORRECT OUTPUT FREQUENCY IS OBTAINED AS ABOVE***
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started