Question: (a) You are given a data set on cancer detection. After building a classification model which achieves an accuracy of 90%, would you be

(a) You are given a data set on cancer detection. After building a classification model which achieves an

(a) You are given a data set on cancer detection. After building a classification model which achieves an accuracy of 90%, would you be satisfied with your model performance? What can you do about it? (b) In the context of k-NN classifier for multi-class classification, consider the case where k = 3 and the three nearest neighbours of a query have three different class labels. How would you assign the class label to the query example in this case? (c) In unsupervised learning, if a ground truth about a dataset is unknown, how can we determine the most useful number of clusters to be? (d) For a supervised classification problem, how can you determine which features are the most important? (e) Your machine learning application requires that the client be provided an explanation for the learning decision. Assuming that a a deep learning model and a decision tree model achieve similar accuracy for your task, which one would you prefer to use and why? (f) k-NN and kmeans clustering both rely crucially on the distance measure used. What is the difference between these two learning techniques? (g) A company has built a classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is highly inaccurate. What might have gone wrong? (h) Sometimes when building a machine learning model we might prefer to have fewer features rather than many feature. Give three reasons why this might be the case. (i) After spending several hours, you are anxious to build a high accuracy model. You built 5 boosting models, but neither of these models performed better than benchmark score. Finally, you decided to combine those models as ensemble models are known to provide high accuracy. If your accuracy still doesn't improve, what could potentially be wrong with your ensemble model? (i) For a given feature, the minimum and maximum value in the training data is 100 and 1000, respectively. The minimum and the maximum value of the feature in the test data is 50 and 950, respectively. What is the correct way to do min-max normalisation of this feature for a test instance with value in order to have a fair validation? A. (100)/(1000 - 100) B. (250)/(950-50) C. (50)/(1000 - 50) D. (100)/(1000-50) [2] [2] [2] [2] [2] [2] [2] [2] [2] [2]

Step by Step Solution

★★★★★

3.44 Rating (147 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

a Achieving an accuracy of 90 in a cancer detection model is a good starting point but it might not be sufficient depending on the specific requirements and consequences of misclassification In medica... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

PART A - MATERIAL PROPERTIES IN UNIAXIAL TENSION Objective To demonstrate the behaviour of brittle and ductile materials in uniaxial tension. Introduction to the Tension Test The term tension test is...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Managing Scope Changes Case Study Scope changes on a project can occur regardless of how well the project is planned or executed. Scope changes can be the result of something that was omitted during...

1 . Introduction The project s purpose is to detect the vest that the workers should wear for occupational health and safety. We use the YOLO ( You Only Look Once ) algorithm to determine whether he...

The project s purpose is to detect the vest that the workers should wear for occupational health and safety. We use the YOLO ( You Only Look Once ) algorithm to determine whether he / she is wearing...

Task 2: Perceptron for binary classification. Perceptron is a supervised learning algorithm for classification or regression. In supervised learning, you are given a data set of pairs, where the...

I want test and train accuraciies in one valu Task 2: Perceptron for binary classification. Perceptron is a supervised learning algorithm for classification or regression. In supervised learning, you...

Choose from the definitions below the best definition of anomaly detection: A) Anomaly detection is the numeric data points that can be summed, counted, or otherwise analyzed using mathematical...

3.2 3 Christoffel functions for outlier detection We are given a data set of n points ; E RP, i = 1, ..., n and we are interested in identifying potential outliers. To this end, we look for a score...

BE562: Problem Set 3 Fall 2016 Due 10/21/2016 8PM 1. Hidden Markov Models and Protein Structure (20 pts) One biological application of hidden Markov models is to determine the secondary structure...

The Ralston Road Warriors have issued four bonds which are listed in the following table. If the corporate tax rate is 33 percent, what is the company's after-tax cost of debt ( NOTE : you don't need...

ABC Commerce Corp.'s weekly payroll totals $20,000 and is paid every two weeks. The final payroll for the year was for the week ended December 24. ABC pays full payroll during the holiday season. The...

P11.13 A feedback system has a plant transfer function Y(s) 45.78 G(s) = R(s) s(s+50) We want the percent overshoot to a step to be P.O. 10% and the settling time (with a 2% criterion) T1s. Design an...

Dozier Company produced and sold 1,000 units during its first month of operations. It reported the following costs and expenses for the month: Required: 1. With respect to cost classifications for...

The following information is available for Ethtridge Manufacturing Company for the month ending July 31: Cost of direct materials used in production...........$1,150,000 Direct...

a. A 30,000 note payable is retired at it's $30,000 carrying (book) value in exchange for cash b. The only changes affecting retained earnings are net income and cash dividends paid. c. New equipment...

Use the following information to determine this companys cash flows from financing activities. a. Net income was $ 35,000. b. Issued common stock for $ 64,000 cash. c. Paid cash dividend of $ 14,600....

Lia Chen and Martin Monroe formed a partnership, dividing income as follows: 1. Annual salary allowance to Chen of $35,000. 2. Interest of 4% on each partner's capital balance on January 1. 3. Any...

what are the overall motivation of the employees in the company

BP Lubricants established the BIGS program following recent merger activity to deliver globally consistent and transparent management information. As well as timely business intelligence, BIGS...

What are authoritative pages, hubs, and hyperlinkinduced topic search (HITS)? Discuss the differences between citations in research articles and hyperlinks on Web pages.

Go to your librarys online resources. Learn how to download attributes of a collection of literature (journal articles) in a specific topic. Download and process the data using a methodology similar...

Calculate the quoted price on June 10, 2008, of the bond in Problem 7.

If a broker quotes a price of 111.25 for a bond on September 10, what amount will a client pay per $1000 face value? The 7% coupon rate is payable on May 15 and November 15 of each year.

A $1000 face value, 6.8% coupon, Province of Ontario bond with 18 years to run until maturity is currently priced to yield investors 6.5% compounded semiannually until maturity. How much lower would...