Question

1 Approved Answer

Posted on Sep 29, 2024

(2 PARTS) (DO NOT USE OTHER EXPERT's SOLUTIONS! THEY ARE WRONG) MAIN ISSUE: USING the two t erm frequency program code following the following two

(2 PARTS)

(DO NOT USE OTHER EXPERT's SOLUTIONS! THEY ARE WRONG)

MAIN ISSUE: USING the two term frequency program code following the following two STYLE constraints and requirements in JAVASCRIPT with NODE.JS, please REVISE the code to ENSURE it will work.

RIGHT NOW THE NEITHER CODE IS GIVING ME THE OUTPUT.TXT FILE WHEN i NEED TO RUN THEM:

THE FIRST ONE COMPLETES BUT THE OUTPUT.TXT FILE IS EMPTY.

THE SECOND IN HADOOP STYLE ONE NEVER COMPLETES. SO I DON'T KNOW WHAT THE ISSUE IS WITH BOTH CODES. PLEASE REVISE AND ENSURE THEY WORK. ** Program must run on command line and take an input file of text called pride-and-prejudice.txt and must output only the TOP 25 most frequent words with their counts and MUST be in order of most frequent at the top and MUST output to a new text file called output.txt NOT the command line. It must FILTER out the STOP WORDS from the list below and take the stop_words.txt file as input (not a string of words hardcoded). Make sure to have the appropriate filters to avoid including output under 2 characters or anything that will make the output different than what I have included below!

stop_words.txt:

a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your

*****Correct output will look like this if written correctly so ENSURE THE STOP WORDS ARE PROPERLY REMOVED TO PROVIDE THE FOLLOWING OUTPUT BEFORE POSTING SOLUTION OR IT WILL BE DOWNVOTED!*****

output.txt: (MUST MATCH THE WORDS AND THE FREQUENCY COUNT 100%) mr - 786 elizabeth - 635 very - 488 darcy - 418 such - 395 mrs - 343 much - 329 more - 327 bennet - 323 bingley - 306 jane - 295 miss - 283 one - 275 know - 239 before - 229 herself - 227 though - 226 well - 224 never - 220 sister - 218 soon - 216 think - 211 now - 209 time - 203 good - 201

PART 1: STYLE 1 CONSTRAINTS: - Existence of one or more units that execute concurrently - Existence of one or more data spaces where concurrent units store and retrieve data - No direct data exchanges between the concurrent units, other than via the data spaces Possible style names: Dataspaces, Linda

jAVASCRIPT/NODE.JS CODE FOR STYLE 1 (dataspaces):

const fs = require("fs"); const { Worker, isMainThread, parentPort, workerData, threadId } = require("worker_threads");

const stopwords = new Set(fs.readFileSync("stop_words.txt").toString().split(","));

// Define two data spaces const wordSpace = []; const freqSpace = new Int32Array(26);

// Worker function that consumes words from the word space and adds their frequency counts to the frequency space function processWords() { const wordFreqs = new Int32Array(26); while (wordSpace.length > 0) { const word = wordSpace.shift(); if (!stopwords.has(word)) { const index = word.charCodeAt(0) - 97; wordFreqs[index]++; } } for (let i = 0; i < freqSpace.length; i++) { Atomics.add(freqSpace, i, wordFreqs[i]); } }

// Main thread populates the word space and launches the workers if (isMainThread) { const input = fs.readFileSync("pride-and-prejudice.txt").toString().toLowerCase(); const words = input.match(/[a-z]{2,}/g); wordSpace.push(...words);

const numWorkers = 5; const workers = []; for (let i = 0; i < numWorkers; i++) { workers.push(new Worker(__filename)); }

// Sort by frequency and write the top 25 to output.txt const sortedFreqs = Object.entries(wordFreqs) .filter(([word, count]) => word.length >= 2) .sort((a, b) => b[1] - a[1]) .slice(0, 25); const output = sortedFreqs.map(([word, count]) => `${word} - ${count}`).join(" "); fs.writeFileSync("output.txt", output); } }); }); } else { processWords(); }

PART 2: STYLE 2 CONSTRAINTS: Very similar to style 1, but with an additional twist - Input data is divided in chunks, similar to what an inverse multiplexer does to input signals - A map function applies a given worker function to each chunk of data, potentially in parallel - The results of the many worker functions are reshuffled in a way that allows for the reduce step to be also parallelized - The reshuffled chunks of data are given as input to a second map function that takes a reducible function as input Possible style names:Map-reduce, Hadoop style, Double inverse multiplexer

jAVASCRIPT CODE FOR STYLE 2 (map reduce/hadoop):

const fs = require("fs"); const os = require("os"); const cluster = require("cluster");

// Define the number of workers to be used const numWorkers = os.cpus().length;

// Define the chunk size const chunkSize = 200;

// Function to partition the data into chunks function partition(data, chunkSize) { const chunks = []; for (let i = 0; i < data.length; i += chunkSize) { chunks.push(data.slice(i, i + chunkSize)); } return chunks; }

// Function to split the words and remove stop words function splitWords(data) { const stopWords = fs.readFileSync("stop_words.txt", "utf8").split(","); const words = data.toLowerCase().replace(/[^\w\s]|_/g, "").split(/\s+/); return words.filter((word) => !stopWords.includes(word) && word.length > 1); }

// Function to map the words and their frequency function map(data) { const wordFreqs = {}; for (const word of data) { if (wordFreqs[word]) { wordFreqs[word]++; } else { wordFreqs[word] = 1; } } return wordFreqs; }

// Function to reduce the frequency of words across chunks function reduce(wordFreqs1, wordFreqs2) { const merged = { ...wordFreqs1 }; for (const [word, freq] of Object.entries(wordFreqs2)) { if (merged[word]) { merged[word] += freq; } else { merged[word] = freq; } } return merged; }

// Main function function main() { // Read the input file const input = fs.readFileSync("pride-and-prejudice.txt", "utf8");

// Partition the input data const chunks = partition(input, chunkSize);

if (cluster.isMaster) { // Create a worker for each CPU core for (let i = 0; i < numWorkers; i++) { const worker = cluster.fork();

// Send the partitioned data to the workers worker.send({ type: "data", data: chunks[i] }); }

// Reduce the results from each worker const wordFreqs = {}; for (const id in cluster.workers) { cluster.workers[id].on("message", (message) => { if (message.type === "result") { const partialResult = message.data; for (const [word, freq] of Object.entries(partialResult)) { if (wordFreqs[word]) { wordFreqs[word] += freq; } else { wordFreqs[word] = freq; } } } }); }

// Print the top 25 words process.on("exit", () => { const sortedWords = Object.entries(wordFreqs) .sort((a, b) => b[1] - a[1]) .slice(0, 25); const output = sortedWords.map((w) => `${w[0]} - ${w[1]}`).join(" "); fs.writeFileSync("output.txt", output, "utf8"); }); } else { // Worker process process.on("message", (message) => { if (message.type === "data") { const data = message.data; const wordFreqs = map(splitWords(data)); process.send({ type: "result", data: wordFreqs }); } }); } }

main(); // Run the main function

**ENSURE BOTH SOLUTIONS ARE WORKING AND FOLLOW THE CORRESPONDING STYLES BEFORE YOU SUBMIT THEM OR I WILL HAVE TO DOWNVOTE! YOU CAN TEST THEM LOCALLY FIRST BY LOOKING UP THE PRIDE-AND-PREJUDICE.TXT FILE AND USING IT SINCE I CAN'T PASTE IT HERE, BUT IT IS WIDELY AVAILABLE.