Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Using Python 3+ 1. Include the following imports at the top of your module (hopefully this is sufficient): from web import LinkCollector # make sure

Using Python 3+

1. Include the following imports at the top of your module (hopefully this is sufficient):

from web import LinkCollector # make sure you did 1 from html.parser import HTMLParser from urllib.request import urlopen from urllib.parse import urljoin from urllib.error import URLError

2. Implement a class ImageCrawler that will inherit from the Crawler developed in amd will both crawl links and collect images. This is very easy by inheriting from and extending the Crawler class. You will need to collect images in a set. Hint: what does it mean to extend? Implementation details:

a. You must inherit from Crawler (see below). Make sure that the module web.py is in your working folder and make sure that you import Crawler from the web module.

b. __init__ - extends Crawlers __init__ by adding an set attribute that will be used to store images

c. Crawl extends Crawlers crawl by creating an image collector, opening the url and then collecting any images from the url in the set of images being stored. I recommend that you collect the images before you call the Crawlers crawl method.

d. getImages returns the set of images collected

web.py:

class Crawler: # does not inherit

# init - create empty sets def __init__(self): self.crawled = set() # html read self.found = set() # found self.dead = set() # cant load

# recursive crawl # c.crawl( 'http://www.kli.org/',2,True) def crawl(self,url,depth,relativeOnly=True): # read the html found at url lc = LinkCollector(url) try: lc.feed( urlopen(url).read().decode() ) except (UnicodeDecodeError,URLError,TypeError): self.dead.add(url) self.crawled.add( url )

# extract links if relativeOnly: # if relativeOnly==True: found = lc.getRelatives() else: found = lc.getLinks() self.found.update( found )

# recursively crawl all the (new) links # that were found if depth>0: for link in found: # dont want to duplicate work if link not in self.crawled: self.crawl( link,depth-1,relativeOnly)

# easy get methods def getCrawled(self): return self.crawled def getFound(self): return self.found def getDead(self): return self.dead

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Systems Analysis And Synthesis Bridging Computer Science And Information Technology

Systems Analysis And Synthesis Bridging Computer Science And Information Technology

Authors: Barry Dwyer

1st Edition

0128054492, 9780128054499

More Books

Students also viewed these Databases questions

Question

★★★★★

The population is aging because of a combination of declining birth rates, declining death rates, and the aging of the baby boom generation. What might be some of the implications of this demographic...

Answered: 1 week ago

Question

★★★★★

Devaluate Dave Santollis entrepreneurial thinking and leadership at Faxtech and at TTI.

Answered: 1 week ago

Question

★★★★★

Martha Young has recently developed a business in laundry services. She has done her homework and is very confident of her business plan. You recently met her over lunch, during which she happily...

Answered: 1 week ago

Question

★★★★★

Recording Encumbrances. During July 2010, the first month of the 2011 fiscal year, the City of Marion issued the following purchase orders and contracts (see Problem 34): General...

Answered: 1 week ago

Question

★★★★★

Using Python 3+ 1. Include the following imports at the top of your module (hopefully this is sufficient): from web import LinkCollector # make sure you did 1 from html.parser import HTMLParser from...

Answered: 1 week ago

Question

★★★★★

To record sale on account Now record the cost of the putters sold on the 6th. Date Accounts and Explanation Debit Credit Nov 6 Choose from any list or enter any number in the input fields and then...

Answered: 1 week ago

Question

★★★★★

As a manufacturing company, Company Kappa sells their sachets of powdered juices to retailers, supermarkets and distributors instead of direct customers. Having said this, they usually deliver...

Answered: 1 week ago

Question

★★★★★

What process improvements can HeroEnergy adopt to improve standardised data, financial reporting and employee engagement? A newly established government agency focusing on renewable energy has just...

Answered: 1 week ago

Question

★★★★★

The Temper-Tone Steel Company produces two major products, a stainless steel sheet called Flextin and stainless steel silverware. Both products go through the same initial production process where...

Answered: 1 week ago

Question

★★★★★

Vaughn Co. purchased 61,7% Ivanhoe Company bonds for $61000 cash. Interest is payable annually on January 1. If 28 of the securities are sold on January 1 for $32700, the entry would include a credit...

Answered: 1 week ago

Question

★★★★★

You are the CEO of Apple, whose iPhones are all assembled in China. President Trump publicly and privately asked you to move some of the production back to the United States. President Biden pushed...

Answered: 1 week ago

Question

★★★★★

Within the White-Collar Occupations, where do Females have the strongest and weakest representations in the Federal Service?

Answered: 1 week ago

Question

★★★★★

What are the Five Phases of SDLC? Explain each briefly.

Answered: 1 week ago

Question

★★★★★

How can Change Control Procedures manage Project Creep?

Answered: 1 week ago

Previous Question Next Question