Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Your team must create a Python class called AIWebCrawler that fulfills the following requirements: Web Crawling: Your crawler must visit all pages within the given

Your team must create a Python class called AIWebCrawler that fulfills the following requirements:
Web Crawling:
Your crawler must visit all pages within the given domain.
The crawler must not navigate to external domains.
Handle different types of web pages and links.
Visiting Strategies:
Implement the visiting strategies: preorder, inorder, and postorder.
The visiting strategy must be specified as a parameter during class instantiation.
Output:
Generate a corpus of text documents containing the content of each visited page.
Ensure the text is free of HTML tags, JavaScript, menu items, and other non-essential elements.
The title of each textual document is the title of the page visited during the crawling phase.
Handling Dynamic Content:
Use JavaScript engines like Chrome Selenium WebDriver to crawl and extract content from dynamic pages.
Ensure the crawler can interpret and navigate JavaScript-rendered content.
Integration with AI (ChatGPT or Google Colab):
Utilize Al capabilities in your crawler for tasks such as parsing, Eext extraction, or decision-making.
Document all the prompts used to generate the web crawler and keep track of the number of times the generated code did not work and how you solved the iss prompt or manual intervention).
Keep track of this information using the following table:
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions