Question
Given a text corpus, develop a positional index. Process phrase and proximity queries using the positional index. You will be provided with a text corpus,
Given a text corpus, develop a positional index. Process phrase and proximity queries using the positional index.
You will be provided with a text corpus, which will be comprised of about 500 research papers published in Learning Analytics & Knowledge (LAK) conferences. The Society for Learning Analytics Research (SoLAR) organizes LAK conferences. LAK conferences focus on research that explores the role and impact of analytics on teaching, learning, training and development. The corpus will be provided in ASCII file format. Each document will come in a separate physical file. You are responsible for normalizing the text.
You have to follow these steps :
1- Normalize text address punctuation characters, stemming/lemmatization, and lowercasing. Do not throw away stop words.
2- Extract tokens and identify vocabulary for the dictionary.
3- Scan the corpus and build an inverted index.
4- Scan the corpus and build a positional index using the inverted index. You may eliminate the inverted index construction in the step above if you can figure out a way to directly construct the positional index.
5- Implement the algorithm for processing phrase/proximity queries. "I attached an image of this algorithm
6- Develop a simple interface for users to specify phrase/proximity queries. Assume that the queries are limited to just two terms. The interface can be as simple as prompting the user for a phrase/proximity query (i.e., a text string). You may also read a phrase/proximity query through command line arguments.
7- Design test cases and execute them. Document execution results.
POSITIONALINTERSECT(p1, p2, k) 2 3 answer while pl * NIL and P2 NIL do if docID(P1)=docID(p2) then I- stions(P) 6 7 8 pp2 positions(p2) while ppl NIL do while PP2 NIL do if Ipos(pp) pos(pp2)l Sk then ADD(l, pos(p/p2)) else if pos(pp2) > pos (ppi) 10 12 13 14 15 16 17 18 19 20 21 then break pp2 next(pp2) while I * and IIO]-pos(pp1)| > k do DELETE(l[0) for each ps E I do ADD(answer, (doc! D (p1), pos(pp), ps) ppl next(ppl) p1 next (P1) P2-next (P2) else if docID(pi) pos (ppi) 10 12 13 14 15 16 17 18 19 20 21 then break pp2 next(pp2) while I * and IIO]-pos(pp1)| > k do DELETE(l[0) for each ps E I do ADD(answer, (doc! D (p1), pos(pp), ps) ppl next(ppl) p1 next (P1) P2-next (P2) else if docID(pi)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started