Question: This project serves as a means for you to demonstrate the amalgamation of a number of technologies that you have studied (or will learn this

This project serves as a means for you to demonstrate the amalgamation of a number of technologies that you have studied (or will learn this semester.) Overall, the project is to obtain data from a set of webpages from the website of your choice (except for a few sites listed below) and should be related to either the academic experience or some major/discipline taught on campus.

To facilitate, provide three website URLs so that if URL#1 is already taken, then you would be requesting URL#2 and if likewise URL#2 is taken, then you will be requesting URL#3. Since the form below is within the context of URL#1, if you are assigned URL#2 or URL#3 because the others are taken, you will be asked to fill in a form within the context of your chosen URL.

Requirement1. Online (Search) function. The user should be able to obtain/search for data online from your approved chosen website. Included in this is the ability to download at least one picture or icon found on the website. We will be discussing the code to accessing the webpages and downloading images in upcoming lectures. In previous semesters, I have asked students to do this with Amazon, Craigslist, Ebay and Abebooks. As such those sites are not eligible for this semester. Please explain within the context of URL#1 what data (and to what extent) can your user search for/obtain from the website chosen. GUIs should be provided for this purpose. OPTIONS: If your program implements a search function for data provided by the website, and there exists an advanced search feature on that site, implementing the advanced search via your own GUI would be an acceptable innovation.

Requirement2. Storage of Data. All data obtained from/for the user should be stored in a storage structure. The program must provide a hash based storage for this purpose. When data is entered into your storage, the user name and a timestamp should be entered with each item of data obtained in Requirement#1. The user should have the option to delete data that they have requested and have been stored prior. In addition, the user should be able to modify the data provided for the user profile (Requirement#11). OPTIONS: It would be impressive to additionally provide the option of storing the data in a database (Mysql or Sun Derby only). If chosen, this option should be another parameter on the command line to give the user the option when reading in the initial set of data whether it will be stored by hash based storage or databases. In addition, this would be acceptable as one of the two innovations mentioned at the end of this document.

Requirement3. Offline queries. (By Requirement#4, the data must be maintained between executions of the program. So, if you shut down the program and start it up again, the data is not lost.) The user should be able to conduct offline queries of your data storage via a GUI. The extent of these queries will depend on what you have stored in your storage structure; i.e., the queries will target the specific information you have stored about each item and/or transactional history in your storage. Since you have a timestamp associated with each data entered into your storage, the user can ask for data obtained within a certain time range. The user should only be able to retrieve data that they requested and was entered to the data storage on the users behalf. The admin can retrieve data from a specific user(s) or all users. The user should be allowed to printout the results of the queries to a local file (presumably interacting with a GUI.) OPTIONS: A nifty option would to allow the user to email the user a report of the obtained data. This would count as an innovation.

Requirement4. Persistence of Data. The notion of persistence states that the data is preserved even if the system crashes. The privileged user/admin only from a GUI should be able to reconstruct the storage from backup data (possibly the Transactional Log from Requirement#9, assuming you have enough information in the log to carry out this task.)

Requirement5. Processing individual pages. This requirement dovetails Requirement#1. The data obtained from the website will be embedded within the webpages provided by the website. These pages are in html. You are to temporarily store the html into a file and then process it. You must right the code yourself and cannot use a third party html API or parser. We will be discussing this code in class in upcoming lectures. You should be processing at least two different webpages from the site. You will be extracting data (text) to be stored into your data storage and images that will be stored locally and the names of the files and locations stored in the database.

Requirement6. Input/Output from Command Line. While you develop your code using a visual editor/debugger utility, the user cannot be expected to learn that development tool. Typically, the compilation of such code and execution of the program will occur from command line (command.exe/cmd.exe on Windows platforms). You certainly can develop your system on such a tool, but you must make sure that it can be compiled and run from the command line. Till this end, you will need to download the Oracel/Sun Java JDK, which includes the Java compiler javac and the Java runtime java.

Source: http://www.oracle.com/technetwork/java/javase/downloads/index.html

When running your program from the command line, you should allow the user pass information or select options via predefind command line flags that you will have defined for parsing. (I emailed the class a website that provides pseudo-code outlining how to accomplish this.) Which flags should you minimally provide?

a) i input_file (where the input_file has an initial set of queries and/or search requests that the user wants the system to process in a batch); b) o output_file (where the output_file will be a printout of the essential data stored in your data store as a result of the queries/requests/searches found in the input_file; c) p (this feature directs the program to process the input_file and print the output to the output_file without starting up the GUIs, without storing the data and then immediately terminates; If you implemented emailing a report then an optional_email if provided would have the report emailed to the user.) d) if databases will be one of your innovations, then the flag h db/ht/bo will allow the user whether to store information into a database (db) or hashtable (hb) or both (bo). Feel free to add flags to the command line such as debug etc. but these extra flags will not be considered an innovation.

Requirement7. GUIs. Your system should include appropriate GUIs to enhance the user experience. The lines of code for these GUIs will not count towards Requirement#10. You may user JavaFX to enhance the users experience. Development with GUIs will not be considered an innovation.

Requirement8. Documentation. Every project must contain proper documentation and a reasonable amount of code comments. The protocol followed should be as discussed in class: meaningful variable names, class and method descriptions, purpose of variable by declaration and by parameter list of methods, before each block of code, and when appropriate line by line comments. A user manual (.doc/x) should be provided describing the functionality of your project and showing a picture of each GUI provided with a description below it as how to use it. An installation guide should describe how to compile/install your system, listing system requirements. Feel free to add to this list, but this requirement will not be eligible to count as an innovation.

Requirement9. Transaction Log. Every transaction that interacts with the data storage should be written to a text file (the log) that contains the name of the transaction (e.g. INSERT, DELETE, MODIFY) with its parameters (data values), the user that requested it, along with the date/timestamp for security purposes. If the transaction doesn't make changes to the data storage, it need not be logged in for persistence, but you still need to keep it for security logging purposes. The admin is the only one who can directly interact with this log.

Requirement10. Code Length. This project is meant to be a serious project. I have emailed you a link for a free utility you will download to count the number of lines of code (as opposed to comments and blank lines) The project will be at least 1000 lines of code, not including code for GUIs. ONLY YOUR CODE. Not 3rd party code or code from a GUI editor/developer that generates code for you.

Requirement11. User Registration. Because security is such an important aspect of modern software engineering, you are to implement a User account system, with a special account for Administrator who can have access to the transaction log and rebuild system based on persistence. A Guest account will allow anyone to search but not store data. The User account in addition stores data and interact with Users obtained data that is stored in storage structure.

CHOOSE TWO NEW FEATURES TO IMPLEMENT FOR THE ABOVE SYSTEM ON YOUR OWN. Requirement12. Innovation#1.

Requirement13. Innovation#2.

Usaragent.html

Find your browser's User Agent - GetRight Download Manager

Find my User Agent

The user agent for your web browser (the one that you are viewing this web page with) is:

GetURLImage.java

import java.awt.image.*;

import javax.imageio.*;

import java.io.*;

import java.net.URL;

import java.net.MalformedURLException;

public class GetURLImage {

URL url = null;

File outputFile = null;

public static BufferedImage image = null;

public static void fetchImageFromURL (URL url) {

try {

// Read from a URL

// URL url = new URL("http://hostname.com/image.gif");

image = ImageIO.read(url);

} catch (IOException e) {

} // catch

} // fetchImageFromURL

// Create a URL from the specified address, open a connection to it,

// and then display information about the URL.

public static void main(String[] args)

throws MalformedURLException, IOException {

URL url = new URL(args[0]);

File outputFile = new File(args[1]);

fetchImageFromURL(url);

ImageIO.write(image, "jpg", outputFile);

} // main

} // GetURLImage

GetURLInfo

// This examplebased on the code from from the book _Java in a Nutshell_ by David Flanagan.

// Written by David Flanagan. Copyright (c) 1996 O'Reilly & Associates.

import java.net.*;

import java.io.*;

import java.util.*;

public class GetURLInfo {

public static void printinfo(URLConnection u) throws IOException {

// Display the URL address, and information about it.

System.out.println(u.getURL().toExternalForm() + ":");

System.out.println(" Content Type: " + u.getContentType());

System.out.println(" Content Length: " + u.getContentLength());

System.out.println(" Last Modified: " + new Date(u.getLastModified()));

System.out.println(" Expiration: " + u.getExpiration());

System.out.println(" Content Encoding: " + u.getContentEncoding());

/*

// Read and print out the first five lines of the URL.

System.out.println("First five lines:");

DataInputStream in = new DataInputStream(u.getInputStream());

for(int i = 0; i < 5; i++) {

String line = in.readLine();

if (line == null) break;

System.out.println(" " + line);

} // for

*/

} // printinfo

// Create a URL from the specified address, open a connection to it,

// and then display information about the URL.

public static void main(String[] args)

throws MalformedURLException, IOException

{

URL url = new URL(args[0]);

URLConnection connection = url.openConnection();

printinfo(connection);

} // main

} // GetURLInfo

WebpageReaderWithoutAgent

import java.io.*;

import java.net.URL;

public class WebpageReaderWithoutAgent {

private static String webpage = null;

public static BufferedReader read(String url) throws Exception {

return new BufferedReader(

new InputStreamReader(

new URL(url).openStream()));

} // read

public static void main (String[] args) throws Exception{

if (args.length == 0) {

System.out.println("No URL inputted.");

System.exit(1);

} // any inputs?

webpage = args[0];

System.out.println("Contents of the following URL: "+webpage+" ");

BufferedReader reader = read(webpage);

String line = reader.readLine();

while (line != null) {

System.out.println(line);

line = reader.readLine();

} // while

} // main

} // WebpageReaderWithoutAgent

WebpageReaderWithoutAgent

import java.io.*;

import java.net.URL;

import java.net.URLConnection;

public class WebpageReaderWithAgent {

private static String webpage = null;

public static final String USER_AGENT = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6";

public static InputStream getURLInputStream(String sURL) throws Exception {

URLConnection oConnection = (new URL(sURL)).openConnection();

oConnection.setRequestProperty("User-Agent", USER_AGENT);

return oConnection.getInputStream();

}

public static BufferedReader read(String url) throws Exception {

//InputStream content = (InputStream)uc.getInputStream();

// BufferedReader in = new BufferedReader (new InputStreamReader

// (content));

InputStream content = (InputStream)getURLInputStream(url);

return new BufferedReader (new InputStreamReader(content));

} // read

public static BufferedReader read2(String url) throws Exception {

return new BufferedReader(

new InputStreamReader(

new URL(url).openStream()));

} // read

public static void main (String[] args) throws Exception{

if (args.length == 0) {

System.out.println("No URL inputted.");

System.exit(1);

} // any inputs?

webpage = args[0];

System.out.println("Contents of the following URL: "+webpage+" ");

BufferedReader reader = read(webpage);

String line = reader.readLine();

while (line != null) {

System.out.println(line);

line = reader.readLine();

} // while

} // main

} // WebpageReaderWithAgent

I am well aware that different libraries exist to perform various tasks that you will need to accomplish in your project.

I will repeat what I said in class and wrote in a prior email. YOU MAY NOT USE THIRD PARTY S/W. You may ONLY use

Standardized Java. Here are simple "litmus tests" to see which one it is. If you downloaded the s/w from

http://www.oracle.com/technetwork/java/javase/downloads/index.html It is OK. This includes their Java DB (formerly called Derby).

If you import from a library that has "org" in the path, it is NOT ok.

I want you to write your OWN html parser.

Thank you in advance for your compliance.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!