All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Hire a Tutor
AI Study Help
New
Search
Search
Sign In
Register
study help
business
management and artificial intelligence
Questions and Answers of
Management And Artificial Intelligence
3.13 Suggest an appropriate definition of operators ('was', 'of', 'the') to be able to write clauses like diana was the secretary of the department. and then ask Prolog: ?- Who was the secretary of
3.12 Assume the operator definitions :- op(300, xfx, plays). :- op( 200, xfy, and). Then the following two terms are syntactically legal objects: Term1 = jimmy plays football and squash Term2 = susan
3.11 Define the relation flatten( List, FlatList) where List can be a list of lists, and FlatList is List 'flattened' so that the elements of List's sublists (or sub-sublists) are reorganized as one
3.10 Define the predicate equal length(L1, L2) which is true if lists L1 and L2 have equal number of elements.
3.9 Define the relation dividelist( List, List1, List2) so that the elements of List are partitioned between List1 and List2, and List1 and List2 are of approximately the same length. For example:
3.8 Define the relation subset(Set, Subset) where Set and Subset are two lists representing two sets. We would like to be able to use this relation not only to check for the subset relation, but also
3.7 Define the relation translate(List1, List2) to translate a list of numbers between 0 and 9 to a list of the corresponding words. For example: translate([3,5,1,3], [three, five, one, three]) Use
3.6 Define the relation shift( List1, List2) so that List2 is List1 'shifted rotationally' by one element to the left. For example, ? shift([1,2,3,4,5], L1), shift(L1, L2). produces: L1 = [2,3,4,5,1]
3.5 Define the predicate palindrome( List). A list is a palindrome if it reads the same in the forward and in the backward direction. For example, [m,a,d,a,m].
3.4 Define the relation reverse( List, ReversedList) that reverses lists. For example, reverse( [a,b,c,d], [d,c,b,a]).
3.3 Define two predicates evenlength(List) and oddlength( List) so that they are true if their argument is a list of even or odd length respectively. For example, the list [a,b,c,d] is 'evenlength'
3.2 Define the relation last(Item, List) so that Item is the last element of a list List. Write two versions: (a) using the conc relation, (b) without conc.
3.1 (a) Write a goal, using conc, to delete the last three elements from a list L producing another list Ll. Hint: L is the concatenation of Ll and a three-element list. (b) Write a goal to delete
2.11 What happens if we ask Prolog: ?- X = f(x). Should this request for matching succeed or fail? According to the definition of unification in logic this should fail, but what happens according to
2.10 Prolog implementations usually provide debugging facilities that enable the interactive tracing of the execution of goals. Such tracing gives similar insights as our diagrammatic traces in
2.9 Consider the program in Figure 2.10 and simulate, in the style of Figure 2.10, Prolog's execution of the question: ?- big(X), dark(X). Compare your execution trace with that of Figure 2.10 when
2.8 Rewrite the following program without using the semicolon notation. translate( Number, Word) :- Number 1, Word = one; Number 2, Word = two; = Number 3, Word = three. =
2.7 The following program says that two people are relatives if (a) one is an ancestor of the other, or (b) they have a common ancestor, or (c) they have a common successor: relatives(X, Y) :-
2.6 Consider the following program: f( 1, one). f(s(1), two). f(s(s(1)), three). f(s(s(s(X))), N) :- f( X, N). How will Prolog answer the following questions? Whenever several answers are possible,
2.5 Assume that a rectangle is represented by the term rectangle( P1, P2, P3, P4) where the P's are the vertices of the rectangle positively ordered. Define the relation: regular(R) which is true if
2.4 Using the representation for line segments as described in this section, write a term that represents any vertical line segment at x = 5.
2.3 Will the following matching operations succeed or fail? If they succeed, what are the resulting instantiations of variables? (a) point(A, B) = point(1, 2) (b) point(A, B) = point(X, Y, Z) (c)
2.2 Suggest a representation for rectangles, squares and circles as structured Prolog objects. Use an approach similar to that in Figure 2.4. For example, a rectangle can be represented by four
2.1 Which of the following are syntactically correct Prolog objects? What kinds of object are they (atom, number, variable, structure)? (a) Diana (b) diana (c) 'Diana' (d) _diana (e) 'Diana goes
1.7 Try to understand how Prolog derives answers to the following questions, using the program of Figure 1.8. Try to draw the corresponding derivation diagrams in the style of Figure 1.9. Will any
1.6 Consider the following alternative definition of the ancestor relation: ancestor(X, Z) :- parent(X, Z). ancestor( X, Z) :- parent( Y, Z), ancestor(X, Y). Does this also seem to be a correct
1.5 Define the relation aunt( X, Y) in terms of the relations parent and sister. As an aid you can first draw a diagram in the style of Figure 1.3 for the aunt relation.
1.4 Define the relation grandchild using the parent relation. Hint: It will be similar to the grandparent relation (see Figure 1.3).
1.3 Translate the following statements into Prolog rules: (a) Everybody who has a child is happy (introduce a one-argument relation happy). (b) For all X, if X has a child who has a sister then X has
1.2 Formulate in Prolog the following questions about the parent relation: (a) Who is Pat's parent? (b) Does Liz have a child? (c) Who is Pat's grandparent?
1.1 Assuming the parent relation as defined in this section (see Figure 1.1), what will be Prolog's answers to the following questions? (a) ? parent( jim, X). (b) ?-parent( X, jim). (c) ?-parent(pam,
How does the bias and variance of a bagged classifier compare to that of an individual classifier from the bagged set of classifiers?
Suppose that your linear regression model shows similar accuracy on the training and test data. How should you modify the regularization parameter?
Suppose that the split at the top level of the decision tree is chosen using a domainspecific condition by a human expert. The splits at other levels are chosen in a datadriven manner. How does the
Suppose that you modify an inductive rule-based classifier into a two-stage classifier.In the first stage, domain-specific rules are used to decide if the test instance matches these conditions. If
What effect does the use of Laplacian smoothing in the Bayes classifier have on the bias and variance?
Suppose that a model provides extremely poor (but similar) accuracies on both the training data and on the test data. What are the most likely sources of the error(among bias, variance, and noise)?
Does the bias of a decision tree increase or decrease by reducing the height of the tree via pruning? How about the variance?
Does the bias of a κ-nearest neighbor classifier increase or decrease with increasing value of κ? what happens to the variance? What does the classifier do, when one sets the value of κ to the
Show how you can perform the steps of the Exercise 5 with the use of stochastic gradient descent rather than gradient descent.
Compute the gradient-descent steps of the optimization model introduced in Section 12.4.5. Show that the gradient-descent steps are as follows:M ⇐ M + αEV Ui ⇐ Ui + αβi(Di − UiV T )V V ⇐ V
Include a concept hierarchy for movie objects based on the genres of the movies.
Consider the IMDB movie database available at the URL https://www.imdb.com/interfaces/. Implement a program to create a heterogeneous information network discussed in Exercise
Consider a repository of movies appearing in different countries. For each movie, you have a hierarchical classification corresponding to the genre. You want to create a heterogeneous network
You may omit the step involving creation of the concept hierarchy.
Consider the DBLP publication database available at the URL https://dblp.uni-trier.de/xml/. Implement a program to create a heterogeneous information network discussed in Exercise
Consider a repository of scientific articles containing articles published in various types of venues. You want to create a heterogeneous network containing three types of objects corresponding to
Propose an approach for using RBMs for outlier detection.
Implement the contrastive divergence algorithm of a restricted Boltzmann machine.Also implement the inference algorithm for deriving the probability distribution of the hidden units for a given test
This chapter discusses how Boltzmann machines can be used for collaborative filtering.Even though discrete sampling of the contrastive divergence algorithm is used for learning the model, the final
What happens for the case when n = ∞?(c) Propose an off-policy n-step learning algorithm like Q-learning and discuss its advantages/disadvantages with respect to (b).
The two-step TD-error is defined as follows:δ(2)t = rt + γrt+1 + γ2V (st+2) − V (st)(a) Propose a TD-learning algorithm for the 2-step case.(b) Propose an on-policy n-step learning algorithm
Write a Q-learning implementation that learns the value of each state-action pair for a game of tic-tac-toe by repeatedly playing against human opponents. No function approximators are used and
Consider the game of tic-tac-toe in which a reward drawn from {−1, 0, +1} is given at the end of the game. Suppose you learn the values of all states (assuming optimal play from both sides).
Consider the well-known game of rock-paper-scissors. Human players often try to use the history of previous moves to guess the next move. Would you use a Q-learning or a policy-based method to learn
You have two slot machines, each of which has an array of 100 lights. The probability distribution of the reward from playing each machine is an unknown (and possibly machine-specific) function of
Throughout this chapter, a neural network, referred to as the policy network, has been used in order to implement the policy gradient. Discuss the importance of the choice of network architecture in
The chapter gives a proof of the likelihood ratio trick (cf. Equation 10.23) for the case in which the action a is discrete. Generalize this result to continuous-valued actions.
The text of the chapter shows how one can transform any linear classifier into recognizing nonlinear decision boundaries by using a feature engineering phase in which the eigenvectors of an
Suppose that you represent your data set as a graph in which each data point is a node, and the weight of the edge between a pair of nodes is equal to the Gaussian kernel similarity between them.
What is the maximum number of possible clusterings of a data set of n points into k groups? What does this imply about the convergence behavior of algorithms whose objective function is guaranteed
Discuss why the following integer matrix factorization is equivalent to the objective function of the k-means algorithm for an n × d matrix D, in which the rows contain the data points:Minimize U,V
The text of the book discusses gradient descent updates (cf. Equation 9.6) for unconstrained matrix factorization D ≈ UV T . Suppose that the matrix D is symmetric, and we want to perform the
Discuss the similarity of this model to that of the addition of bias to classification models. How is gradient descent modified?
Biased matrix factorization: Consider the factorization of an incomplete n × d matrix D into an n × k matrix U and a d × k matrix V :D ≈ UV T Suppose you add the constraint that all entries of
Recommender systems: Let D be an n × d matrix in which only a small subset of the entries are specified. This is commonly the case with recommender systems. Show how you can adapt the algorithm for
Suppose that you are given a truncated SVD D ≈ QΣPT of rank-k. Show how you can use this solution to derive an alternative rank-k decomposition QΣPT in which the unit columns of Q (or/and P)
Let D be an n×d data matrix, and y be an n-dimensional column vector containing the dependent variables of linear regression. The regularized solution to linear regression predicts the dependent
Use singular value decomposition to show the push-through identity for any n × d matrix D:(λId + DTD)−1DT = DT (λIn + DDT )−1
How would your architecture for the previous question change if you were given a training database in which the mutation positions in each sequence were tagged, and the test database was untagged?
Suppose that you have a large database of biological strings containing sequences of nucleobases drawn from {A,C, T,G}. Some of these strings contain unusual mutations representing changes in the
Propose a neural architecture to perform binary classification of a sequence.
Download the character-level RNN in [222], and train it on the “tiny Shakespeare” data set available at the same location. Create outputs of the language model after training for (i) 5 epochs,
Perform a 4 × 4 pooling at stride 1 of the input volume in the upper-left corner of Figure 8.4.
Compute the convolution of the input volume in the upper-left corner of Figure 8.2 with the horizontal edge detection filter of Figure 8.1(b). Use a stride of 1 without padding.
Download an implementation of the AlexNet architecture from a neural network library of your choice. Train the network on subsets of varying size from the ImageNet data, and plot the top-5 error with
Work out the number of parameters in each spatial layer for column D of Table 8.1.
Work out the sizes of the spatial convolution layers for each of the columns of Table 8.1.In each case, we start with an input image volume of 224 × 224 × 3.
Justify your answer in each case.
Consider an activation volume of size 13×13×64 and a filter of size 3×3×64. Discuss whether it is possible to perform convolutions with strides 2, 3, 4, and
For a one-dimensional time series of length L and a filter of size F, what is the length of the output? How much padding would you need to keep the output size to a constant value?
Perform a convolution with a 1-dimensional filter 1, 0, 1 and zero padding.
Consider a 1-dimensional time-series with values 2, 1, 3, 4,
Multinomial logistic regression with neural networks: Propose a neural network architecture using the softmax activation function and an appropriate loss function that can perform multinomial
Convert the weighted computational graph of Figure 7.2 into an unweighted graph by defining additional nodes containing w1 . . . w5 along with appropriately defined hidden nodes.
Consider the computational graph shown in Figure 7.19(b), in which the local derivative∂y(j)∂y(i) is shown for each edge (i, j), where y(k) denotes the activation of node k.The output o is 0.1,
Consider the computational graph shown in Figure 7.19(a), in which the local derivative∂y(j)∂y(i) is shown for each edge (i, j), where y(k) denotes the activation of node k.The output o is 0.1,
Consider the computational graph of Figure 7.10. The upper node in each layer computes sin(x + y) and the lower node in each layer computes cos(x + y) with respect to its two inputs. For the first
Consider the computational graph of Figure 7.10. For a particular numerical input x =a, you find the unusual situation that the value ∂y(j)∂y(i) is 0.3 for each and every edge (i, j) in the
Use the pathwise aggregation lemma to compute the derivative of y(10) with respect to each of y(1), y(2), and y(3) as an algebraic expression (cf. Figure 7.11). You should get the same derivative as
All-pairs node-to-node derivatives: Let y(i) be the variable in node i in a directed acyclic computational graph containing n nodes and m edges. Consider the case where one wants to compute S(i, j) =
Forward Mode Differentiation: The backpropagation algorithm needs to compute node-to-node derivatives of output nodes with respect to all other nodes, and therefore computing gradients in the
Consider a neural network in which a vectored node v feeds into two distinct vectored nodes h1 and h2 computing different functions. The functions computed at the nodes are h1 = ReLU(W1v) and h2 =
For Exercise 11, show the following loss-to-weight derivatives:∂L∂U=tp=1∂L(op)∂op hT p ,∂L∂W=tp=2Δp−1∂L∂hp hT p−1,∂L∂V=tp=1Δp∂L∂hp xTp What are the sizes and ranks of
Suppose that the output structure of the neural network in Exercise 9 is changed so that there are k-dimensional outputs o1 . . . ot in each layer, and the overall loss is L = t i=1 L(oi). The output
Show that if we use the loss function L(o) in Exercise 9, then the loss-to-node gradient can be computed for the final layer ht as follows:∂L(o)∂ht= UT ∂L(o)∂o The updates in earlier layers
Consider a neural network that has hidden layers h1 . . . ht, inputs x1 . . . xt into each layer, and outputs o from the final layer ht. The recurrence equation for the pth layer is as follows:o =
Consider the neural architecture with connections between alternate layers, as shown in Figure 7.15(b). Suppose that the recurrence equations of this neural network are as follows:h1 = ReLU(W1x)h2 =
Discuss why the dynamic programming algorithm for computing the gradients will not work in the case where the computational graph contains cycles.
Consider a computational graph in which you are told that the variables on the edges satisfy k linear equality constraints. Discuss how you would train the weights of such a graph. How would your
Showing 800 - 900
of 4588
First
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Last