Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Background: HTML HTML, the HyperText Markup Language, is used to describe the content of webpages. This is done by annotating the content with tags. HTML

Background: HTML

HTML, the HyperText Markup Language, is used to describe the content of webpages. This is done by annotating the content with tags. HTML has two types of tags: Element tags and Void. HTML elements consist of a start tag, some content, and an end tag. For example,

image text in transcribed

illustrates a bold element, which is how content can be bolded in HTML. Observe that both the start and the end tag of an element have the same name, except that the end tag starts with a /. Elements can be nested. An HTML document comprises many nested elements, such as the example below:

image text in transcribed

There are also void or empty tags, which are singleton tags that do not bracket content and hence do not need to be closed. For example, "hr" in Figure 1 is a void tag. Special void tags, whose names begin with a !, are used to include comments and metadata in the HTML. Tags beginning with ! can be ignored.

Problem: Analyzing the Structure of HTML

The nesting of tags creates a natural structure in an HTML document, which is useful to visualize. For example, the structure of the previous example can be visualized as an outline: In this example, the "html" element contains one element: "body". The "body" element

image text in transcribed

contains four elements: "h1", "p", "hr", and "p". All the elements but the last one contain no elements, and the last element, "p" contains a "b" element. Observe that the deeper an element is nested, the further it is indented. Furthermore, every element is followed by an integer, indicating how many elements and void tags it contains. Your task is to create a program that generates an outline representation of an HTML document based on the requirements below.

Write a program called HTMLSummarizer.java that reads in an HTML file and outputs the corresponding outline representation. Your HTMLSummarizer class must implement the provided Tester interface and also contain the main() method where your program starts running. This is because your program will be tested via this interface. The interface contains a single method:

public interface Tester { public ArrayList compute( Scanner input ); }

This method must perform the required computation.

Input

The method takes a Scanner object, which contains a valid HTML file. You may assume that every start tag is properly matched with an end tag, and that there are no errors in the input. Hint: Use the provided HTMLScanner object to easily parse the input by using its two methods: hasNextTag() and nextTag().

public class HTMLScanner { public static final String [] voidTags = { "area", "base", "br", "col", "command", "embed", "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr"};

private Scanner stream; private String tagBody;

public HTMLScanner( Scanner s ) { stream = s; stream.useDelimiter( "[]" ); }

public boolean hasNextTag() { return stream.hasNext(); }

public String nextTag() { Scanner tag = new Scanner( stream.next() ); String tagName = tag.next(); if( tag.hasNext() ) { tagBody = tag.nextLine(); } else { tagBody = ""; }

if( stream.hasNext() ) { stream.next(); } tag.close(); return tagName.toLowerCase(); } public String getTagBody() { return tagBody; } }

Output

The method compute( Scanner input ) should return an ArrayList of Strings denoting the outline representation of the HTML input. Each element or void tag should be a separate String. The order of the elements and void tags should be the same as in the HTML input and should be indented the same number of spaces as their depth. Lastly, each element in the outline is followed by a single space and an integer, denoting the number of children the element has.

image text in transcribed

Hints and Suggestions

Java has both stacks (java.utils.Stack) and lists (java.utils.ArrayList) built in. As well as binary search (java.utils.Arrays).

This is bold content This is bold content

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Marketing Database Analytics

Authors: Andrew D. Banasiewicz

1st Edition

0415657881, 978-0415657884

More Books

Students also viewed these Databases questions