Question: (a) You are given a data set on cancer detection. After building a classification model which achieves an accuracy of 90%, would you be
(a) You are given a data set on cancer detection. After building a classification model which achieves an accuracy of 90%, would you be satisfied with your model performance? What can you do about it? (b) In the context of k-NN classifier for multi-class classification, consider the case where k = 3 and the three nearest neighbours of a query have three different class labels. How would you assign the class label to the query example in this case? (c) In unsupervised learning, if a ground truth about a dataset is unknown, how can we determine the most useful number of clusters to be? (d) For a supervised classification problem, how can you determine which features are the most important? (e) Your machine learning application requires that the client be provided an explanation for the learning decision. Assuming that a a deep learning model and a decision tree model achieve similar accuracy for your task, which one would you prefer to use and why? (f) k-NN and kmeans clustering both rely crucially on the distance measure used. What is the difference between these two learning techniques? (g) A company has built a classifier that gets 100% accuracy on training data. When they deployed this model on client side it has been found that the model is highly inaccurate. What might have gone wrong? (h) Sometimes when building a machine learning model we might prefer to have fewer features rather than many feature. Give three reasons why this might be the case. (i) After spending several hours, you are anxious to build a high accuracy model. You built 5 boosting models, but neither of these models performed better than benchmark score. Finally, you decided to combine those models as ensemble models are known to provide high accuracy. If your accuracy still doesn't improve, what could potentially be wrong with your ensemble model? (i) For a given feature, the minimum and maximum value in the training data is 100 and 1000, respectively. The minimum and the maximum value of the feature in the test data is 50 and 950, respectively. What is the correct way to do min-max normalisation of this feature for a test instance with value in order to have a fair validation? A. (100)/(1000 - 100) B. (250)/(950-50) C. (50)/(1000 - 50) D. (100)/(1000-50) [2] [2] [2] [2] [2] [2] [2] [2] [2] [2]
Step by Step Solution
3.44 Rating (147 Votes )
There are 3 Steps involved in it
a Achieving an accuracy of 90 in a cancer detection model is a good starting point but it might not be sufficient depending on the specific requirements and consequences of misclassification In medica... View full answer
Get step-by-step solutions from verified subject matter experts
