Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Computer filesystems are often filled with multiple copies of identical files. These identical files take up unnecessary space, since one copy, plus appropriate file system

Computer filesystems are often filled with multiple copies of identical files. These identical files take up unnecessary space, since one copy, plus appropriate file system links, are sufficient to represent all the copies. A simple way to combat this problem is to compare all pairs of files, removing all duplicates.

Comparing all pairs of n files requires n(n1)/2 comparisons, which can be quite large, since n may be very large. Compounding the problem is the fact that comparing two files is slow, especially if they are large and mostly the same.

One way to reduce this cost is to convert each file to a short string (called a hash) using a deterministic hash function, and then compare these hashes (instead of whole files) to identify potential matches. If two hashes dont match, neither do their corresponding files. If two hashes do match, it may be because their corresponding files are identical, or because there is a hash collision (two different files that produce the same hash). The most straightforward way to determine this is to compare the files directly.

For this problem, write a program which determines file duplicates using hashing (to identify potential duplicates and eliminate impossible ones). The hash function is simple, taking as input the entire file, and producing as output one byte. The output is the exclusive or (XOR) of the ASCII value of every byte in the input.

Input

Input consists of up to 250 test cases. Each test case begins with an integer 1n500 indicating the number of files. This is followed by n lines, each representing one file. Each line has 1 to 50 characters, using only the characters az, AZ, space, and period (.). Input ends when n=0.

Output

For each test case, print the number of unique files and the number of hash collisions between all pairs of files.

Sample Input 1
4 four score and seven years ago score four and seven years ago four score and seven years ago ask not what your country can do for you 4 the quick brown fox jumped over the lazy dog over the lazy dog the quick brown fox jumped the lazy quick fox jumped over the brown dog the quick lazy dog over the brown fox jumped 0 

Sample Output 1

3 2 4 6 

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

PostgreSQL Up And Running A Practical Guide To The Advanced Open Source Database

Authors: Regina Obe, Leo Hsu

3rd Edition

1491963417, 978-1491963418

More Books

Students also viewed these Databases questions

Question

How would you describe your typical day at work?

Answered: 1 week ago