Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

/25/22, 1:10 AM Assignment 0 Question 5. The programming part of this assignment is a simple exercise with text data. Here's what you should do,

image text in transcribed

/25/22, 1:10 AM Assignment 0 Question 5. The programming part of this assignment is a simple exercise with text data. Here's what you should do, and please note that it's fine to be quick and dirty; the goal here is not to produce an elegant solution. You are welcome to do this entirely in Python, or using a combination of Python and Unix commands. - Using whatever means you would like (e.g. we get, curl, viewing the page source HTML in a browser and copy/pasting, Python urllib, etc.), get the HTML source for this page. - Use Beautiful Soup to extract the visible text from the page. (This discussion on StackOverflow might be helpful. Repeat after me: "StackOverflow is my friend.") - Remove any character in the text that is not in [a-zA-Z] by turning it into whitespace. - Convert the entire text to lowercase. - Turn the text into "words" by splitting on whitespace. - As your response, include: - The 20 most frequent words in descending order, in the following format: 150 the 124 of 115 and ...etc - Your code. Note that if this requires more than one page of your homework PDF in a readable font, you almost certainly did more work than you needed to. Please don't forget to tell us approximately how much time you spent on this

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Information And Database Systems Asian Conference Aciids 2012 Kaohsiung Taiwan March 19 21 2012 Proceedings Part 3 Lnai 7198

Authors: Jeng-Shyang Pan ,Shyi-Ming Chen ,Ngoc-Thanh Nguyen

2012th Edition

3642284922, 978-3642284922

Students also viewed these Databases questions