Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. Algorithm Given matrices X and Y, your program will compute (((X^T)X)^1)(X^T)Y in order to learn W. This will require (1) multiplying, (2) transposing, and

1. Algorithm

Given matrices X and Y, your program will compute (((X^T)X)^1)(X^T)Y in order to learn W. This will require (1) multiplying, (2) transposing, and (3) inverting matrices. Transposing an mn matrix produces an nm matrix. Each row of the X becomes a columnof X^T. To nd the inverse of (X^T)X, you will use a simplied form of Gauss-Jordan elimination.

For example, if the training data includes n houses and has k attributes, this data can be represented as an n(k + 1) matrix X, where each row corresponds to a house and each column corresponds to an attribute.

Note that the rst column contains 1 for all rows: this corresponds to the weight w0.

2. Program

You will write a program learn that uses a training data set to learn weights for a set of house attributes, and then applies those weights to a set of input data to calculate prices for those houses. learn takes two arguments, which are the paths to les containing the training data and input data. Training data format - The rst line will be the word train. The second line will contain an integer k, giving the number of attributes. The third line will contain an integer n, giving the number of houses. The next n lines will contain k + 1 oating-point numbers, separated by spaces. Each line gives data for a house. The rst k numbers give the values x1xk for that house, and the last number gives its price y.

For example, a le train.txt might contain: train

4

7

3.000000 1.000000 1180.000000 1955.000000 221900.000000 3.000000 2.250000 2570.000000 1951.000000 538000.000000

2.000000 1.000000 770.000000 1933.000000 180000.000000

4.000000 3.000000 1960.000000 1965.000000 604000.000000

3.000000 2.000000 1680.000000 1987.000000 510000.000000

4.000000 4.500000 5420.000000 2001.000000 1230000.000000

3.000000 2.250000 1715.000000 1995.000000 257500.000000

This le contains data for 7 houses, with 4 attributes and a price for each house. The corresponding matrix X will be 75 and Y will be 71. (Recall that column 0 of X is all ones.)

Input data format - The rst line will be the word data. The second line will be an integer k, giving the number of attributes. The third line will be an ineteger m, giving the number of houses. The next m lines will contain k oating-point numbers, separated by spaces. Each line gives data for a house, not including its price.

For example, a le data.txt might contain: data

4

2

3.000000 2.500000 3560.000000 1965.000000

2.000000 1.000000 1160.000000 1942.000000 This contains data for 2 houses, with 4 attributes for each house. The corresponding matrix X will be 25.

Output format - Your program should output the prices computed for each house in the input data using the weights derived from the training data. Each house price will be printed on a line, rounded to the nearest integer. To print a oating-point number rounded to the nearest integer, use the formatting code %.0f, as in: printf("%.0f ", price); Usage - Assuming the les train.txt and data.txt exist in the same directory as learn:

$ ./learn train.txt data.txt

737861

203060

Implementation notes - You MUST use double to represent the attributes, weights, and prices. Using float may result in incorrect results due to rounding. To read double values from the training and input data les, you can use fscanf with the format code %lf. If learn successfully completes, it MUST return exit code 0. You MAY assume that the training and input data les are correctly formatted. You MAY assume that the rst argument is a training data le and that the second argument is an input data le. However, checking that the training data le begins with train and that the input data le begins with data may be helpful if you accidentally give the wrong arguments to learn while you are testing it. To read a string containing up to 5 non-space characters, you can use the fscanf format code %5s. learn SHOULD check that the training and input data les specify the same value for k. If the training or input les do not exist, are not readable, are incorrectly formatted, or specify dierent values of k, learn MAY print error and return exit code 1. Your code will not be tested with these scenarios.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Secrets Of Analytical Leaders Insights From Information Insiders

Authors: Wayne Eckerson

1st Edition

1935504347, 9781935504344

More Books

Students also viewed these Databases questions