Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Using Python 3+ 1. Include the following imports at the top of your module (hopefully this is sufficient): from web import LinkCollector # make sure

Using Python 3+

1. Include the following imports at the top of your module (hopefully this is sufficient):

from web import LinkCollector # make sure you did 1 from html.parser import HTMLParser from urllib.request import urlopen from urllib.parse import urljoin from urllib.error import URLError

2. Implement a class ImageCrawler that will inherit from the Crawler developed in amd will both crawl links and collect images. This is very easy by inheriting from and extending the Crawler class. You will need to collect images in a set. Hint: what does it mean to extend? Implementation details:

a. You must inherit from Crawler (see below). Make sure that the module web.py is in your working folder and make sure that you import Crawler from the web module.

b. __init__ - extends Crawlers __init__ by adding an set attribute that will be used to store images

c. Crawl extends Crawlers crawl by creating an image collector, opening the url and then collecting any images from the url in the set of images being stored. I recommend that you collect the images before you call the Crawlers crawl method.

d. getImages returns the set of images collected

web.py:

class Crawler: # does not inherit

# init - create empty sets def __init__(self): self.crawled = set() # html read self.found = set() # found self.dead = set() # cant load

# recursive crawl # c.crawl( 'http://www.kli.org/',2,True) def crawl(self,url,depth,relativeOnly=True): # read the html found at url lc = LinkCollector(url) try: lc.feed( urlopen(url).read().decode() ) except (UnicodeDecodeError,URLError,TypeError): self.dead.add(url) self.crawled.add( url )

# extract links if relativeOnly: # if relativeOnly==True: found = lc.getRelatives() else: found = lc.getLinks() self.found.update( found )

# recursively crawl all the (new) links # that were found if depth>0: for link in found: # dont want to duplicate work if link not in self.crawled: self.crawl( link,depth-1,relativeOnly)

# easy get methods def getCrawled(self): return self.crawled def getFound(self): return self.found def getDead(self): return self.dead

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Systems Analysis And Synthesis Bridging Computer Science And Information Technology

Authors: Barry Dwyer

1st Edition

0128054492, 9780128054499

More Books

Students also viewed these Databases questions

Question

What are the Five Phases of SDLC? Explain each briefly.

Answered: 1 week ago

Question

How can Change Control Procedures manage Project Creep?

Answered: 1 week ago