Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 24, 2024

Answer in C . How would I make this using a Hahtable? Also what would be a good algorithm to help me understand? Warning: As

Answer in C . How would I make this using a Hahtable? Also what would be a good algorithm to help me understand?

Warning:

As you will see below, the descriptions of the assignments will be increasingly complex

because we are asking you to build increasingly bigger programs.

Make sure to read the assignment

carefully!

. Introduction

In this assignment you will practice using the file system API (as well as pointers in different

data structures). In particular you will be creating, opening, reading, writing, and deleting files. Your

task is to write an indexing program, called an indexer. Given a set of files, an indexer will parse the

files and create an inverted index, which maps each token found in the files to the subset of files that

contain that token. In your indexer, you will also maintain the frequency with which each token appears

in each file. The indexer should

tokenize

the files and produce an inverted index

of how many times

the word occurred in each file,

sorted by word.

our output should

be in the following format

count0

count1

count2

count3

count4

The above depiction gives a logical view of the inverted index. In your program, you

have to define data structures to hold the mappings (token to list) and the records (file

name, count).

An inverted index is a sequence of mappings where each mapping maps a token (e.g.,

dog) to a list of records, with each record containing the name of a file whose content

contains the token and the frequency with which the token appears in that filename.

Here is an example of how the indexer should work. If you are given the following set of files:

File Path

File Content

/adir/boo

A dog named named Boo

/adir/baa

A cat named Baa

/adir/bdir/baa

Cat cat

Your indexer should output:

The inverted index file that your indexer writes must follow the XML format defined above. Words

must be sorted in alphanumeric order. All characters of a word should be first converted to lowercase

before the word is counted. Your output should print with the lists arranged in alphanumeric order (a to

z, 0 to 9) of the tokens. The filenames in your output should be in descending order by frequency

count (highest frequency to lowest frequency).If there is a word with the same frequency in two or

more files, order them by path name alphanumerically (a to z, 0 to 9).

After constructing the entire inverted index in memory, the indexer will save it to a file.

2. Implementation

our program must implement the following command-line interface:

invertedI

ndex

The first argument, , gives the name of a file that you should create to

hold your inverted index. The second argument, , gives the name of the

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Microsoft SQL Server 2014 Administration

Authors: Adam Jorgensen, Bradley Ball

1st Edition

★★★★★

What was the role of the team leader? How was he or she selected?

Answered: 1 week ago

Previous Question Next Question