Question
Develop a web links scraper program in Python that extracts all of the unique web links that point out to other web pages from the
Develop a web links scraper program in Python that extractsallof the unique web links that point out to other web pages from the HTML code of the "Current Estimates" web link,bothfrom the "US Census Bureau" website (see web link below) and outside that domain, and that populates them in a comma-separated values (CSV) file as absolute uniform resource indicators (URIs).
A.Explain how the Python program extracts the web links from the HTML code of the "Current Estimates," found in web links section.
B.Explain the criteria you used to determine if a link is a locator to another HTML page. Identify the code segment that executes this action as part of your explanation.
C.Explain how the program ensures that relative links are saved as absolute URIs in the output file. Identify the code segment that executes this action as part of your explanation.
D.Explain how the program ensures that there are no duplicated links in the output file. Identify the code that executes this action as part of your explanation.
Note: Please consider weblinks that point to the same web pages as identical(e.g.,www.commerce.gov and www.commerce.gov/).
E.Provide the Python code you wrote to extractallthe unique web links from the HTML code of the "Current Estimates" (in the web links section), that point out to other HTML pages.
F.Provide the HTML code of the "Current Estimates" web page scrapped at the time when the scraper was run and the CSV file was generated.
G.Provide the CSV file that your script created.
H.Run your script and provide a screenshot of the successfully executed results.
I.Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.
J.Demonstrate professional communication in the content and presentation of your submission.
I have written the code but i am not sure if it is correct since i got an error towards the end.
from bs4 import BeautifulSoup
import csv
import requests
from urllib.parse import urljoin
URL = 'https://www.census.gov/programs-surveys/popest.html'
census = requests.get(URL)
census.text
soup = BeautifulSoup(census.text,'html.parser')
my_links = soup.find_all('a')
print(len(my_links))
my_set = set()
for link in my_links:
href = str(link.get("href"))
if href.startswith('None'):
elif href.startswith('#https'):
myset.add((hrefs[1:]))
elif hrefs.startswith('#'):
elif hrefs.startswith('/'):
Myset.add('https://www.census.gov/programs-surveys/popest.html' + hrefs)
elif hrefs.endswith('.gov'):
Myset.add(hrefs + '/')
else:
Myset.add(hrefs)
(#I get an error here of)
File "" , line 7 elif href.startswith('#https'): ^ IndentationError: expected an indented block
with open("FinalAssignment.csv", "w") as csv_file:
writer = csv.writer(f)
for x in MySet:
writer.writerow([x])
Any help would be great thanks!
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started