Question: CSE 5 8 1 0 Fall 2 0 2 2 Midterm Exam Name: 1 . What is the difference between the criteria for supervised and

CSE 5810 Fall 2022
Midterm Exam
Name:
1. What is the difference between the criteria for supervised and unsupervised dimensionality reduction? Explain in not more than 4-5 sentences. We have the following six records in 2-dimensions: [22]T,[35]T,[52]T,[45]T,[76]T , and [32]T. It is desired to obtain a one-dimensional representation of these records using PCA. Perform the necessary calculations to obtain such a representation. Suppose you were to reconstruct the 2-dimensional data back from the reduced one-dimensional reduction. What will be the approximation error expressed as mean square error between the original data points and the reconstructed data points? (5+10+10 Points)
2.(a) The word bank occurs in 6,805 documents in a collection of 45,000 documents. In document D1 in this collection, the word bank is found to be present 26 times. What will be the weight assigned to word bank for document D1 using tf-idf scheme?
(b) A system retrieves 10 documents with precision being 70% and recall being 40% for a given query. When 10 additional documents for the same query are retrieved, the precision is found to remain the same. What is the new value of the recall rate? (5+10 Points)
3. Why is nave Bayes classifier called naive? Consider the following table of probabilities for a digit recognition classifier where a digit is recognized by the presence/absence of seven segments that make up a digit pattern. The cell at the intersection of the i-th row and j-th column gives the conditional probability that the segment j is on if the digit is i.
Suppose the input to the classifier is as follows:
Segment s1=1, Segment s2=0, Segment s3=1, Segment s4=0, Segment s5=1, Segment s6=1, Segment s7 information is missing.
What is the probability that the input to the nave Bayes classifier with the above probability table is 8? Assume all digits equally likely. (5+10 Points)
4. What are the major strengths of decision tree models? Consider the following eight records; each record is described by two quantitative attributes, x1 and x2:
A=(2,10)t,B=(2,5)t,C=(8,4)t,D=(5,8)t,E=(7,5)t,F=(6,4)t G=(1,2)t,H=(4,9)t.
The first four records are from class C1 and the last four are from class C2. Suppose you split the data into two groups using the test x2<4.5. What will be the amount of information gain from such a split? (5+10 Points)
5.(a) You used the following four (x,y) pairs to build a regression model: (2,3),(4,6),(6,8), and (8,11). The resulting regression model is given by the following expression:
Yipred =0.75+1.2 Xi
What is the value of R-square (coefficient of Determination) for this model?
(b) You are trying to use a three-layer neural network to build a multiple regression model. The number of predictors is 5. The hidden layer has 3 neurons and the output layer consists of a single neuron. What is the total number of parameters, i.e. weights, including the bias weights that this particular configuration of the neural network has? Suggest a minimum number of training examples that you think are needed. Justify your answer. (10+10 Points)
6. a. Suppose you have trained a three layer feedforward network for a non-linearly separable binary classification problem. While deploying the network in actual use, you decide to make all activation functions linear. What would happen to the decision boundary? (5 Points)
b. You have two distributions. A friend of yours calculates the KL divergence measure for them and obtains a value close to zero. What conclusion can you draw from this result? (5 Points)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!