Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please answer 2, 3 and 4 Data is everywhere nowadays. Analyzing data can help us to discover the pattern in a date set to predict

image text in transcribed
image text in transcribed
Please answer 2, 3 and 4
Data is everywhere nowadays. Analyzing data can help us to discover the pattern in a date set to predict the outcome. It can also help us automatically classify documents. The article in the following link illustrates the documentation classification approach a simple case. https://www.kdnuggets.com/2015/01/text-analysis-101-document-classification.html based on It mentions three steps to classify the document. They are collecting data, preprocessing and applying classifying algorithm and strategy separately. In this part, we are going to finish the first two steps by answering the questions below 1) (2 points) Assume now you are given a webpage with a URL as below cnn.com/2018/03/28ews/companies/amazon-stock/index.html The webpage is the news about "amazon". Please write a single line shell command to download the source code of the webpage and save the file as news.html. Hint: you can use wget in Ubuntu and curl in MacOs] 2) (2 points) Html is a mark-up language in which the tags are used to let web browser know how to format the content. You can refer to the link below for more details: https://www.tutorialspoint.com/html/html overview.htm Now your task is to extract the actual article from the webpage source code new.html by extracting the text tagged by pp>. Please use a single line shell command to finish this task and save the article to file news.txt. Note: the extracted article is like below: 3) (3 points) The first two steps have helped us collect data from web resource. Then we are going to preprocess the article by removing some less important words from the text file, such as "a, "the" and so on. These words are not used for text mining. Please write a C program removeChar.c to finish this task. To simplify this problem, your C program just needs to obtain the input from standard input and remove only word Data is everywhere nowadays. Analyzing data can help us to discover the pattern in a date set to predict the outcome. It can also help us automatically classify documents. The article in the following link illustrates the documentation classification approach a simple case. https://www.kdnuggets.com/2015/01/text-analysis-101-document-classification.html based on It mentions three steps to classify the document. They are collecting data, preprocessing and applying classifying algorithm and strategy separately. In this part, we are going to finish the first two steps by answering the questions below 1) (2 points) Assume now you are given a webpage with a URL as below cnn.com/2018/03/28ews/companies/amazon-stock/index.html The webpage is the news about "amazon". Please write a single line shell command to download the source code of the webpage and save the file as news.html. Hint: you can use wget in Ubuntu and curl in MacOs] 2) (2 points) Html is a mark-up language in which the tags are used to let web browser know how to format the content. You can refer to the link below for more details: https://www.tutorialspoint.com/html/html overview.htm Now your task is to extract the actual article from the webpage source code new.html by extracting the text tagged by pp>. Please use a single line shell command to finish this task and save the article to file news.txt. Note: the extracted article is like below: 3) (3 points) The first two steps have helped us collect data from web resource. Then we are going to preprocess the article by removing some less important words from the text file, such as "a, "the" and so on. These words are not used for text mining. Please write a C program removeChar.c to finish this task. To simplify this problem, your C program just needs to obtain the input from standard input and remove only word

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

PostgreSQL Up And Running A Practical Guide To The Advanced Open Source Database

Authors: Regina Obe, Leo Hsu

3rd Edition

1491963417, 978-1491963418

More Books

Students also viewed these Databases questions