Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Phase 2 of project: The second phase of the project focuses on the ability to process information about and on a webpage. For this phase,

Phase 2 of project:

The second phase of the project focuses on the ability to process information about and on a webpage. For this phase, You can modify and use the code I post as following but you will need to comment before each method used the source of your code.

This phase as the others must work from the command line. For this phase, your program will accept two file names from the command line; the first is the name of the input file and the second is the name of the output file.

For this phase, your input file entered will be a list of URLs that you will provide. The lines of input in the file should be of webpages that end in a mix of .htm (or .html), .jpg (or .jpeg), .txt and .pdf file endings. If you have problems finding appropriate examples, ask me after class; but, first try search engines!

For each URL you are processing, obtain URLInfo. Output the name of the URL and all of its info about the URL to an output file (whose name will be the second parameter on the command line you pass to java; the first is name of the input file above). The remaining processing will depend on the file ending that the URL points to (ie that you will be reading from). If the URL is a .html or .htm or .txt file, then read all the lines of the file and save it to a separate file on your system with the Same Name as the URL file name (ie last portion of actual URL). Then, output the number of lines read in and the name of the file to the output file (again this is referred to by the second parameter on the command line). If the URL is an image file, .jpeg or .jpg or .gif, then save the image file to your computer with the Same Name as the URL file name. Then, output the name of the file to the output file. If the URL is .pdf then save the file to your computer with the Same Name as the URL file name and output the name of the file to the output file. In the case of .pdf you will probably need to copy byte by byte instead of line by line because parts of .pdf files are stored in binary encodings and not ascii.

Usaragent.html

Find your browser's User Agent - GetRight Download Manager

Find my User Agent

The user agent for your web browser (the one that you are viewing this web page with) is:

GetURLImage.java

import java.awt.image.*;

import javax.imageio.*;

import java.io.*;

import java.net.URL;

import java.net.MalformedURLException;

public class GetURLImage {

URL url = null;

File outputFile = null;

public static BufferedImage image = null;

public static void fetchImageFromURL (URL url) {

try {

// Read from a URL

// URL url = new URL("http://hostname.com/image.gif");

image = ImageIO.read(url);

} catch (IOException e) {

} // catch

} // fetchImageFromURL

// Create a URL from the specified address, open a connection to it,

// and then display information about the URL.

public static void main(String[] args)

throws MalformedURLException, IOException {

URL url = new URL(args[0]);

File outputFile = new File(args[1]);

fetchImageFromURL(url);

ImageIO.write(image, "jpg", outputFile);

} // main

} // GetURLImage

GetURLInfo

// This examplebased on the code from from the book _Java in a Nutshell_ by David Flanagan.

// Written by David Flanagan. Copyright (c) 1996 O'Reilly & Associates.

import java.net.*;

import java.io.*;

import java.util.*;

public class GetURLInfo {

public static void printinfo(URLConnection u) throws IOException {

// Display the URL address, and information about it.

System.out.println(u.getURL().toExternalForm() + ":");

System.out.println(" Content Type: " + u.getContentType());

System.out.println(" Content Length: " + u.getContentLength());

System.out.println(" Last Modified: " + new Date(u.getLastModified()));

System.out.println(" Expiration: " + u.getExpiration());

System.out.println(" Content Encoding: " + u.getContentEncoding());

/*

// Read and print out the first five lines of the URL.

System.out.println("First five lines:");

DataInputStream in = new DataInputStream(u.getInputStream());

for(int i = 0; i < 5; i++) {

String line = in.readLine();

if (line == null) break;

System.out.println(" " + line);

} // for

*/

} // printinfo

// Create a URL from the specified address, open a connection to it,

// and then display information about the URL.

public static void main(String[] args)

throws MalformedURLException, IOException{

URL url = new URL(args[0]);

URLConnection connection = url.openConnection();

printinfo(connection);

} // main

} // GetURLInfo

WebpageReaderWithoutAgent

import java.io.*;

import java.net.URL;

public class WebpageReaderWithoutAgent {

private static String webpage = null;

public static BufferedReader read(String url) throws Exception {

return new BufferedReader(

new InputStreamReader(

new URL(url).openStream()));

} // read

public static void main (String[] args) throws Exception{

if (args.length == 0) {

System.out.println("No URL inputted.");

System.exit(1);

} // any inputs?

webpage = args[0];

System.out.println("Contents of the following URL: "+webpage+" ");

BufferedReader reader = read(webpage);

String line = reader.readLine();

while (line != null) {

System.out.println(line);

line = reader.readLine();

} // while

} // main

} // WebpageReaderWithoutAgent

WebpageReaderWithoutAgent

import java.io.*;

import java.net.URL;

import java.net.URLConnection;

public class WebpageReaderWithAgent {

private static String webpage = null;

public static final String USER_AGENT = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6";

public static InputStream getURLInputStream(String sURL) throws Exception {

URLConnection oConnection = (new URL(sURL)).openConnection();

oConnection.setRequestProperty("User-Agent", USER_AGENT);

return oConnection.getInputStream();

}

public static BufferedReader read(String url) throws Exception {

//InputStream content = (InputStream)uc.getInputStream();

// BufferedReader in = new BufferedReader (new InputStreamReader

// (content));

InputStream content = (InputStream)getURLInputStream(url);

return new BufferedReader (new InputStreamReader(content));

} // read

public static BufferedReader read2(String url) throws Exception {

return new BufferedReader(

new InputStreamReader(

new URL(url).openStream()));

} // read

public static void main (String[] args) throws Exception{

if (args.length == 0) {

System.out.println("No URL inputted.");

System.exit(1);

} // any inputs?

webpage = args[0];

System.out.println("Contents of the following URL: "+webpage+" ");

BufferedReader reader = read(webpage);

String line = reader.readLine();

while (line != null) {

System.out.println(line);

line = reader.readLine();

} // while

} // main

} // WebpageReaderWithAgent

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

AWS Certified Database Study Guide Specialty DBS-C01 Exam

Authors: Matheus Arrais, Rene Martinez Bravet, Leonardo Ciccone, Angie Nobre Cocharero, Erika Kurauchi, Hugo Rozestraten

1st Edition

1119778956, 978-1119778950

Students also viewed these Databases questions

Question

LO4 Specify how to design a training program for adult learners.

Answered: 1 week ago

Question

Determine miller indices of plane A Z a/2 X a/2 a/2 Y

Answered: 1 week ago

Question

How do Data Types perform data validation?

Answered: 1 week ago

Question

How does Referential Integrity work?

Answered: 1 week ago