Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 16, 2024

I want to add another step into the following code: remove duplicate URLs. from bs4 import BeautifulSoup from urllib.request import urlopen from urllib.parse import urljoin

I want to add another step into the following code: remove duplicate URLs.

from bs4 import BeautifulSoup

from urllib.request import urlopen

from urllib.parse import urljoin

import csv

my_url = 'https://www.census.gov/programs-surveys/popest/about/schedule.html'

# opening up connection, grabbing the page

page = urlopen(my_url)

# html parsering

soup = BeautifulSoup(page, 'html.parser')

#save as csv file

with open('index.csv','w') as csv_file:

writer = csv.writer(csv_file)

for link in soup.find_all('a', href=True):

url = link.get('href')

url = urljoin(my_url, url)

print (url)

writer.writerow([url])

I am trying to add this part:

#remove duplicate links

file = open('index.csv', 'w')

links = {}

for link in soup.find_all('a', href=True):

url = link.get('href')

url = urljoin(my_url, url)

if url not in links:

file.write("%s " % url)

links[url] = True

file.close()

Doesn't seem working. I want find all links from the web, all relative links become absolute URLs, no duplicate links, save as CSV file.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft Dynamics 365 Core Finance And Operations Exams And Practice Tests Exam Study Guide For Microsoft Mb 300

Authors: Exam Library

1st Edition

979-8858858331

Students also viewed these Programming questions

Question

★★★★★

Who owns the U.S. government's debt?

Answered: 1 week ago

Question

★★★★★

2. (15 pts) Implement the Josephus Problem, whose description can be found in note3-LinkedList-Stack-Queue.pptx (referred to as note3 here- after). Note that some online descriptions of this problem...

Answered: 1 week ago

Question

★★★★★

18. A prominent online movie rental service mails rental DVDs to consumers. The service offers two pricing plans. Under the first plan, consumers face a flat $10 fee each month and can rent as many...

Answered: 1 week ago

Question

★★★★★

Gulf Coast Fashions sells both designer and moderately priced womens wear in Tampa. Profits have been volatile. Top management is trying to decide which product line to drop. Accountants have...

Answered: 1 week ago

Question

★★★★★

Python program that interacts with the file system. Using the OS library to validate that a directory exists before creating a file in that directory. The program should prompt the user for the...

Answered: 1 week ago

Question

★★★★★

Chapter 4 rcise 4-1.. Part 1 of 3 HW Score: 65.51%, 16.38 of 25 points O Points: 0 of 5 Save Save Based on the following adjusted trial balance, prepare a classified balance sheet for Wilfred Rentals...

Answered: 1 week ago

Question

★★★★★

5. Suppose you want to sell a dozen purple roses for $15. a. How many dozen purple roses should you expect to sell each month? The expected amount to be sold in a month is 520. b. Is this better or...

Answered: 1 week ago

Question

★★★★★

Question 4 (1 point) Dixie Co. Budgeting Income Statement Sales $ 600,000 Cost of Goods Sold $ 360,000 Gross Profit $ 240,000 $ 200,000 Selling & Admin Operating Income $ 40,000 Dixie Co. is...

Answered: 1 week ago

Question

★★★★★

Problem 3 (4 points) Consider a tariff-setting game between three countries, A, B, and C. These countries are different sizes, and tariffs may have terms-of-trade effects (and deadweight losses). The...

Answered: 1 week ago

Question

★★★★★

Question 5 [9 marks] Bronson, a successful fund manager at a well-known trust fund firm, has more than 15 years of experience under his belt. He has managed as significant as $500 million worth of...

Answered: 1 week ago

Question

★★★★★

Number of dogs encounters per day, the number of customers entering a store per hour, the number of bacteria per liter of water, etc. are classically modelled using the Poisson distribution. With an...

Answered: 1 week ago

Previous Question Next Question