Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Need help with a Python problem about Parsers: Write class HeaderParser that is a subclass of the HTMLParser class (no global variables). It will find

Need help with a Python problem about Parsers:

Write class HeaderParser that is a subclass of the HTMLParser class (no global variables). It will find and collect the contents of all the headings in an HTML file fed to it. The parser works by identifying when a header tag has been encountered and setting a boolean variable in the class to indicate that. When the data handler for the class is called and the boolean in the class indicates that a header is currently open, the data inside the header is added to a list. Finally, when a closing header tag is encountered the boolean variable is unset. To implement this parser you will need to override the following methods of the HTMLParser class:

__init__: The constructor calls the parent class constructor, set the boolean variable in the object appropriately, and sets the list of headings to the empty list.

handle_starttag: If the tag that resulted in this method being called is a header, the header indicator should be set.

handle_endtag: If the tag that resulted in this method being called is a header, the header indicator should be unset.

handle_data: If the parser is currently inside a header, then the data should be added to the list of headers contents. Make sure that you strip any leading or trailing spaces or newlines off the contents of the header before adding it to the list.

getHeadings: The function returns the list of headings gathered by the parser.

You can find a template for the class and a test function testHParser() in the given template file. The following shows what that test function would display on several sample web pages. Note that your solution must work on any page, not just the ones provided here. Think carefully about what it means to collect headings in a general context:

image text in transcribed

--------------------------------------------------------------------------------------------------

This is the template file that is given (Copy/Paste):

from html.parser import HTMLParser from urllib.request import urlopen

class HeaderParser(HTMLParser): '''subclass of imported HTMLParser class. Finds and collects contents of headings in an HTML file that is fed to it. Identifies when a heading tag has been encountered and sets a boolean variable to indicate it. When data handler is called and the boolean in the calss indicates the header is currently open, the data inside the header is added to a list. Finally, when a closing header tag is encountered, the boolean variable is unset. Override the following methods''' def __init__(self): '''The constructor calls the parent class constructor, sets the boolean variable in the object appropriately, and sets the empty list''' pass

def handle_starttag(self, tag, attrs): '''If the tag that resulted in this method being called is a header, the header indicator should be set''' pass

def handle_endtag(self, tag): '''If the tag that resulted in this method being called is a header, the header indicator should be unset''' pass

def handle_data(self, data): '''If the parser is currently inside a header, then the data should be added to the list. Strips any leading or trailing white space.''' pass

def getHeadings(self): '''The function returns the list of headings gathered by the parser''' pass

def testHParser(url): '''opens the url, reads it, and converts it to a string saved in a variable. Creats an object of HeaderParser class and feeds it the string. Returns the the getHeadings method''' pass

--------------------------------------------------------------------------------------------------------

Python 3.4.1 Shell File Edit Shell Debug Options Windows Help test Parser http://facweb.cdm. de paul .edu/ asettle/csc242 web/headings.html C' Heading one Small Heading Third Heading lst test HPar ser http://facweb.cdm. de paul .edu/ asettle/csc2 42 test.html lst C' Hello World Bigger heading test Parser http://facweb.cdm. depaul .edu/ asettle/csc242 web/cookie .html Cookie the cat testHParser http://www.depaul edu C' Upcoming Admission Open House Events Headlines Explore De Paul\u20ob u200bt u 200b Academics' Arts Academics Mission Chicago Academics' 'DePaul Un iversity u200b About Academics Admission Financial Aid Student Life\u20 Ob Resources Information For Quick links Ln: 38 Col: 4

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Sybase Database Administrators Handbook

Authors: Brian Hitchcock

1st Edition

0133574776, 978-0133574777

More Books

Students also viewed these Databases questions