Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Your team must create a Python class called AIWebCrawler that fulfills the following requirements: Web Crawling: Your crawler must visit all pages within the given
Your team must create a Python class called AIWebCrawler that fulfills the following requirements: Web Crawling: Your crawler must visit all pages within the given domain. The crawler must not navigate to external domains. Handle different types of web pages and links. Visiting Strategies: Implement the visiting strategies: preorder, inorder, and postorder. The visiting strategy must be specified as a parameter during class instantiation. Output: Generate a corpus of text documents containing the content of each visited page. Ensure the text is free of HTML tags, JavaScript, menu items, and other nonessential elements. The title of each textual document is the title of the page visited during the crawling phase. Handling Dynamic Content: Use JavaScript engines like Chrome Selenium WebDriver to crawl and extract content from dynamic pages. Ensure the crawler can interpret and navigate JavaScriptrendered content. Integration with AI ChatGPT or Google Colab: Utilize Al capabilities in your crawler for tasks such as parsing, Eext extraction, or decisionmaking. Document all the prompts used to generate the web crawler and keep track of the number of times the generated code did not work and how you solved the iss prompt or manual intervention Keep track of this information using the following table:
Your team must create a Python class called AIWebCrawler that fulfills the following requirements:
Web Crawling:
Your crawler must visit all pages within the given domain.
The crawler must not navigate to external domains.
Handle different types of web pages and links.
Visiting Strategies:
Implement the visiting strategies: preorder, inorder, and postorder.
The visiting strategy must be specified as a parameter during class instantiation.
Output:
Generate a corpus of text documents containing the content of each visited page.
Ensure the text is free of HTML tags, JavaScript, menu items, and other nonessential elements.
The title of each textual document is the title of the page visited during the crawling phase.
Handling Dynamic Content:
Use JavaScript engines like Chrome Selenium WebDriver to crawl and extract content from dynamic pages.
Ensure the crawler can interpret and navigate JavaScriptrendered content.
Integration with AI ChatGPT or Google Colab:
Utilize Al capabilities in your crawler for tasks such as parsing, Eext extraction, or decisionmaking.
Document all the prompts used to generate the web crawler and keep track of the number of times the generated code did not work and how you solved the iss prompt or manual intervention
Keep track of this information using the following table:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started