Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

( a ) What issues are to be considered while selecting a model for applying machine learning in a given problem. ( 4 Marks )

(a) What issues are to be considered while selecting a model for applying machine learning in a given problem.
(4 Marks)
(b)(i) Clearly differentiate between feature selection and feature extraction.
(ii) Describe the forward selection algorithm for implementing the subset selection procedure for dimensionality reduction and specify any requirement(s) and/or limitation(s).
(iii) Given the data in the following table, use PCA to reduce the dimension from 2 to 1 :
\table[[Feature,Example 1,Example 2,Example 3,Example 4],[x1,4,8,13,7],[x2,11,4,5,14]]
(4+6+11 Marks )
Question 2: (25 Marks)
(a) Consider the problem of finding a rule for determining days on which one can enjoy water sport. The rule is to depend on a few attributes like "temp", "humidity", etc. Suppose we have the following data to help us devise the rule. In the data, a value of "1" for "enjoy" means "yes" and a value of "0" indicates "no".
Table 1: Attributes Data.
\table[[Example,Sky,Temp,Humidity,wind,Water,Forecast,Enjoy],[1,Sunny,Warm,Normal,Strong,Warm,Same,1],[2,Sunny,Warm,High,Strong,Warm,Same,1],[3,Rainy,Cold,High,Strong,Warm,Change,0],[4,Sunny,Warm,High,Strong,Cool,Change,1]]
Find the hypothesis space and the version space for the problem.
(6 Marks)
Please Turn Over
Page 2 of 4
(b) Explain cross-validation in machine learning and hence explain the different types of cross-validations along with their limitations, if any.
(c) Given the following data in Table 2, construct the ROC curve of the data. Compute the AUC.
Table 2: Data.
\table[[Threshold,TP,TN,FP,FN],[1,0,25,0,29],[2,7,25,0,22],[3,18,24,1,11],[4,26,20,5,3],[5,29,11,14,0],[6,29,0,25,0],[7,29,0,25,0]]
(6+3 Marks )
Question 3: (25 Marks)
(a) Write down the naive Bayes' algorithm.
(b) How do we use numeric features with Naive Bayes?
(6 Marks)
(c) Find the ML estimate for the parameters representing the mean and that representing the variance in the normal probability function.
(7 Marks)
(d) Let S be a set of examples, A a feature having c different values and let the set of values of A be denoted by Values(A). Define Gain(S,A), Split Information (S,A) and Gain Ratio (S,A).
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Management Databases And Organizations

Authors: Watson Watson

5th Edition

0471715360, 978-0471715368

More Books

Students also viewed these Databases questions

Question

Calculate the finance charge

Answered: 1 week ago

Question

Outline some key aspects and contemporary issues in IHRM

Answered: 1 week ago