Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 16, 2024

Develop a web links scraper program in Python that extracts all of the unique web links that point out to other web pages from the

Develop a web links scraper program in Python that extractsallof the unique web links that point out to other web pages from the HTML code of the "Current Estimates" web link,bothfrom the "US Census Bureau" website (see web link below) and outside that domain, and that populates them in a comma-separated values (CSV) file as absolute uniform resource indicators (URIs).

A.Explain how the Python program extracts the web links from the HTML code of the "Current Estimates," found in web links section.

B.Explain the criteria you used to determine if a link is a locator to another HTML page. Identify the code segment that executes this action as part of your explanation.

C.Explain how the program ensures that relative links are saved as absolute URIs in the output file. Identify the code segment that executes this action as part of your explanation.

D.Explain how the program ensures that there are no duplicated links in the output file. Identify the code that executes this action as part of your explanation.

Note: Please consider weblinks that point to the same web pages as identical(e.g.,www.commerce.gov and www.commerce.gov/).

E.Provide the Python code you wrote to extractallthe unique web links from the HTML code of the "Current Estimates" (in the web links section), that point out to other HTML pages.

F.Provide the HTML code of the "Current Estimates" web page scrapped at the time when the scraper was run and the CSV file was generated.

G.Provide the CSV file that your script created.

H.Run your script and provide a screenshot of the successfully executed results.

I.Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.

J.Demonstrate professional communication in the content and presentation of your submission.

I have written the code but i am not sure if it is correct since i got an error towards the end.

from bs4 import BeautifulSoup

import csv

import requests

from urllib.parse import urljoin

URL = 'https://www.census.gov/programs-surveys/popest.html'

census = requests.get(URL)

census.text

soup = BeautifulSoup(census.text,'html.parser')

my_links = soup.find_all('a')

print(len(my_links))

my_set = set()

for link in my_links:

href = str(link.get("href"))

if href.startswith('None'):

elif href.startswith('#https'):

myset.add((hrefs[1:]))

elif hrefs.startswith('#'):

elif hrefs.startswith('/'):

Myset.add('https://www.census.gov/programs-surveys/popest.html' + hrefs)

elif hrefs.endswith('.gov'):

Myset.add(hrefs + '/')

else:

Myset.add(hrefs)

(#I get an error here of)

 File "", line 7 elif href.startswith('#https'):  ^ IndentationError: expected an indented block

with open("FinalAssignment.csv", "w") as csv_file:

writer = csv.writer(f)

for x in MySet:

writer.writerow([x])

Any help would be great thanks!

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Financial management theory and practice

Authors: Eugene F. Brigham and Michael C. Ehrhardt

★★★★★

The walls of the ventricle possess thick muscular projections they are known as......?

Answered: 1 week ago

Previous Question Next Question