Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Consider the following: The Government Printing Office (GPO) manages the publication, indexing, and archiving of all US Federal Government public documents. Traditionally, agencies submittted copies

Consider the following: The Government Printing Office (GPO) manages the publication, indexing, and archiving of all US Federal Government public documents. Traditionally, agencies submittted copies of each document to the GPO, together with a form listing key metadata such as title, authors, date of publication, abstract, the agency that sponsored the document, and address or contact information. This data was then keyed in by GPO staff and entered into an index that could be searched for matching documents. The process is labor-intensive, expensive, and error-prone.

In addition, agencies are neglecting to submit increasingly large fractions of their documents, preferring to release them directly to their own websites. This poses a problem for the GPO, which is still charged with mainintianing an index of all public government documents

The GPO has experimented with using search-engine-like web crawlers to periodically scan government agency websites to find and download new documents, rather than relying on agencies to submit those documents. Such documents, however, do not come with the agency-written form listing the metadata.

The GPO has therefore commissioned a prototype system, called Extract, that can process documents (PDF files), locating and extracting these metadata fields (title, author, date, abstract, sponsor, and address). The major challenge of Extract is to cope with the wide variety of different layouts or formats in which documents can be produced. Extract must be trained on each new document layout so that it knows how to recognize that layout and where, in that layout, it can find each piece of metadata. On familiar layouts, Extract should be able to simply process a document and generate the metadata. Titles and dates should be extractable from a known layout with 100% accuracy, abstracts with 95% accuracy, names, addresses and sponsors with 90% accuracy. This process should be manageable by any of the typical GPO clerical staff.

When presented with a document in a layout on which it has not been trained, Extract must request human intervention. A GPO Librarian would then gather a training set of documents in the new layout and use them to train Extract to find the metadata within the new layout.

Write 2 non-functional requirements implied by the above statement, in a form suitable for use in an SRS.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions