Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This project must be implemented in C . (I have also project with in java, but I want that this project must be implemented in

This project must be implemented in C . (I have also project with in java, but I want that this project must be implemented in C.)

Notes

It is okey to use code or other material you find from outside sources (such as the Web), given that you also provide proper citations.

When compiling and linking, you should use the argument -pthread to the compiler. This takes care of adding in the right libraries, etc., for using pthreads.

Background

As a part of this project, you are given a large Web Search query log which you will use throughout this project. Below, you can find the Dropbox link to the folder containing the query log:

https://www.dropbox.com/sh/3fugywbb8t7hz67/AABJSKT1GKckU7RhICc3o2Mva?dl=0

Make sure that there are 10 files within this folder, named as Data1 to 10.txt. Each of these files contain exactly one query at each line. A query can contain, letters, numeric characters, and punctuation marks. You are not responsible for cleaning or processing the queries, however if your future implementation allows you to execute a cleaner and more functional code, you are free to do so.

Note: The files are pretty large, and thus it will take some time to download. Make sure that you start downloading them as soon as possible.

Objective

The objective of the project is actually very simple. We ask you to build a dictionary for these query logs and write the dictionary into a file named "Dictionary.txt". In this file each line will constitute to a unique query followed by the frequency of the query in the Data files. For example: If query "Winter Gardening" appears 20 times within all 10 files, the line representing this particular query will be "Winter Gardening 20" (the query and the frequency should be separated with a TAB character, i.e., \t)

During the project we will examine different strategies and compare them with each other. In order to ease the tasks at hand, the project is divided into several tasks:

Task 1: Building a Memory-Based Data Structure for Building the Dictionary

Basically, what you will do is read each of the files line by line and build a data structure that will keep track of the queries (whether they appear previously or not, and if so, what was their frequencies).

A suitable data structure for this task is called a "TRIE" data structure. You make refer to any outside source to find out how a trie works and how it can be implemented.

As the first task of your project implement a Trie data structure. You are also allowed to use an already existing implementation from outside sources. However be warned: general implementation of a trie usually is not implemented to keep track of frequencies. Also most tries do not count the space character as a valid character. Thus, you will need to do modifications even if you can find an already implemented trie. Possible methods are:

Implement the trie yourself.

Modify a trie implementation so that it suits your goals.

Keep additional data structures to trie so that you can keep query frequencies.

Task 2: Sequential Execution

The pseudocode for your program will be something like this:

starting from the first file, read one query (that is one line) from the Data File

Insert it into the Trie.

If already in the Trie, increment its count

If not in the Trie, create a new branch and set its count to 1.

When files are finished write the resulting Trie into the output file.

You should observe the execution time of this program when you are finished.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Formal SQL Tuning For Oracle Databases Practical Efficiency Efficient Practice

Authors: Leonid Nossov ,Hanno Ernst ,Victor Chupis

1st Edition

3662570564, 978-3662570562

More Books

Students also viewed these Databases questions