Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem Statement ( in natural language ) . You are given a set of 5 6 9 instances, each with information about a patient s

Problem Statement (in natural language). You are given a set of 569 instances, each with information about a patients cell, that is or is not cancerous. The information includes the patients ID (which should not be relevant to his/her diagnosis) and a diagnosis (malignant or benign), as well as a set of attributes that are usually used for diagnosis. You are asked to design, implement, and train a machine learning model, and assess its quality in terms of (a) accuracy of diagnosis (of instances not used for training) and (b) computational efficiency and scalability of your solution. The machine learning model, which an expert recommended, is called k-nearest neighbour (or k-NN). You are required to use a tree data structure that would optimize both accuracy and efficiency (with accuracy taking precedence). In answer to this problem, submit 1 Java program, 1 report (PDF) and 1 signed expectations of originality form (PDF). The report must include the following (I-III) items.
(10%) I. A 1-pargraph characterization of the problem, in abstract and precise computer science terms, excluding all irrelevant details and superfluous language. Adherence to formatting instructions (see Formatting on page 2) counts towards this part of the grade.
(20%) II. A description of the model you will use to solve the problem: 1) The main data structure(s) as annotated diagram(s); 2) The main algorithm(s) operating on the data structure(s), as high-level flow chart(s), augmented as necessary by lower-level pseudo-codes of the main methods employed.
(40%) III. The results with analysis of the testing of the trained model, presented as 2 figures that show how much running time and testing accuracy change as a function of the number of training instances (N). T is the number of test instances; each figure should have three series of points for the three (3) values of k (see below).
Guidelines. Use N =100,200,300,400,500. The ratio N/T must be maintained at 4/1. An instance used for training, in a given train-and-test run, cannot be used for testing. Also, you must retrain the model prior to each test. Further, the (N+T) instances must be chosen at random from the complete set. Though this is not the case in medical practice*, implement accuracy as the percentage of test instances (T) that are correctly predicted by your trained model (i.e., predicted diagnosis = actual diagnosis). For running time, just use actual execution time (of testing, not
Page 2 of 2

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

SQL Server T-SQL Recipes

Authors: David Dye, Jason Brimhall

4th Edition

1484200616, 9781484200612

More Books

Students also viewed these Databases questions

Question

What do you like to do for fun/to relax?

Answered: 1 week ago