Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. Build a Fourth-gram language model Each student needs to collect an Arabic corpus of at least 100,000 words, but the more is better. A

image text in transcribedimage text in transcribed

1. Build a Fourth-gram language model Each student needs to collect an Arabic corpus of at least 100,000 words, but the more is better. A bonus will be given if the corpus contains Arabic dialects. Students cannot use the same corpus, fully or partially. Write a program to tokenize the corpus into tokens/words, then build a 4-gram model for this corpus. That is, your language model is a table that contains: the token, the token counts, and the token probability. The language model should be saved in CSV format. 2. Develop a plagiarism detection interface Develop a program (in JAVA) that uses your language model to compute a plagiarism score for a given sentence. In other words, the user can write a sentence in Arabic, and when clicking "Go", the program will compute the probability of this sentence using the language mode. This probability should be tuned to reflect a plagiarism score. The more similar a given sentence (fully or partially) to sentences in the corpus the higher the plagiarism score. Example: Submission: corpus language model.csv, source code, and all files used to run the project. During the discussion, students will be also asked theoretical questions related NLP

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

Describe the patterns of business communication.

Answered: 1 week ago

Question

3. Provide two explanations for the effects of mass media

Answered: 1 week ago