[Solved] i have to create internet news archive by

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 09, 2024

i have to create internet news archive by using python with a GUI. have i document with downloaded news from WSJ online regularly updated source.

i have to create internet news archive by using python with a GUI. have i document with downloaded news from WSJ online regularly updated source. Under the users control, it extracts individual elements from web documents stored in an archive (a folder of previously downloaded HTML/XML documents) and uses these elements to construct an entirely new HTML document which summarises a particular days top stories in a form which can be viewed in a standard web browser. It also allows the user to download the current version of the source web page and store it in the archive as the latest news. i want to know how to create it? Development hints This is a substantial task, so you should not attempt to do it all at once. In particular, you should work on the back end parts first, designing your HTML document and working out how to extract the necessary elements for it from the archived web pages, before attempting the Graphical User Interface front end. If you are unable to complete the whole task, just submit those stages you can get working. You will receive partial marks for incomplete solutions. It is suggested that you use the following development process: 1. Find a suitable source of online news articles. Keep in mind that the web site you choose must be updated daily, must have at least ten articles online at any time, and each article must have a heading, a short synopsis, a (link to a) photograph, a link to a full description of the story, and a publication date. Some starting points for finding such sites can be found in Appendix A below. 2. Create your Internet Archive by downloading seven copies of the web site, one per day. You should do so using the provided downloader.py program or a similar Python application so that the documents in your archive have the same structure that your application will see when downloading the latest news. (Otherwise you will have to write separate pattern matching code for extracting items from the archive and from the latest download!) 3. Study the HTML/XML source code of the archived documents to determine how the elements you want to extract are marked up. Typically you will want to identify the markup tags, and perhaps other unchanging parts of the document, that uniquely identify the beginning and end of the text and image addresses you want to extract. 4. Using the provided regex_tester.py application, devise regular expressions which extract just the necessary elements from the relevant parts of the archived web documents. 5. You can now develop a simple prototype of your back end function(s) that just extracts and saves the required elements from an archived web document. 6. Design the HTML source code for your extracted news stories, with appropriate placeholders for the downloaded web elements you will insert. Keep the document simple and its source code neat. 7. Develop the necessary Python code to extract the HTML elements from an archived document and create the HTML file. This completes the major back end part of your solution. 8. Develop a function to download the latest news and save it in the archive. (The provided downloader.py application is very helpful at this point!) 9. Develop a function to open a given HTML document in the host computers default web browser. Importantly, this needs to be done in a way that will work on any computing platform. The provided template file contains hints for how to do this, using Python functions for finding the path to the folder in which the Python script resides (getcwd), for normalising full file names to the local computing environment (normpath), and for opening a file in the host systems default web browser (called webopen in our template). 10. Add the Graphical User Interface front end to your program. Decide whether you want to use push buttons, radio buttons, menus, lists or some other mechanism for choosing, extracting and displaying archived news. Developing the GUI is the messiest step, and is best left to the end.