Questions and Answers of Artificial Intelligence A Modern approach

You would like to train a neural network to classify digits. Your network takes as input an image and outputs probabilities for each of the 10 classes, 0-9. The network’s prediction is the class
In this question we will perform the backward pass algorithm on the formulaa. Calculate the following partial derivatives of f.(i) b. Calculate the following partial derivatives of
This exercise asks you to implement the beginnings of a simple deep learning package.a. Implement a data structure for general computation graphs, as described in Section 21.1, and define the node
Implement a passive learning agent in a simple environment, such as the 4 × 3 world. For the case of an initially unknown environment model, compare the learning performance of the direct utility
Consider a text corpus consisting of N tokens of d distinct words and the number of times each distinct word w appears is given by xw. We want to apply a version of Laplace smoothing that estimates a
Implement an exploring reinforcement learning agent that uses direct utility estimation. Make two versions—one with a tabular representation and one using the function approxi-mator in Equation
Create a test set of ten queries, and pose them to two different Web search engines. Evaluate each one for precision at the top 1, 3, and 10 documents. Can you explain the differences between engines?
Estimate how much storage space is necessary for the index to a 100 billion-page corpus of Web pages. Show the assumptions you made.
Collect some examples of time expressions, such as “two o’clock,” “midnight,” and “12:46.” Also think up some examples that are ungrammatical, such as “thirteen o’clock” or
In this exercise you will transform E0 into Chomsky Normal Form (CNF). There are five steps: (a) Add a new start symbol, (b) Eliminate ϵ rules, (c) Eliminate multiple words on
Write a regular expression or a short program to extract company names. Test it on a corpus of business news articles. Report your recall and precision.
This exercise explores the quality of the n-gram model of language. Find or create a monolingual corpus of 100,000 words or more. Segment it into words, and compute the frequency of each word. How
An HMM grammar is essentially a standard HMM whose state variable is N (nonterminal, with values such as Det, Adjective, Noun and so on) and whose evidence variable is W (word, with values such as
Consider the following PCFG for simple verb phrases: 0.1 : V P → Verb 0.2 : V P → Copula Adjective 0.5 : V P → Verb the Noun 0.2 : V P → V P Adverb 0.5 : Verb →
Consider the following PCFG: S → NP VP [1.0] NP → Noun [0.6] | Pronoun [0.4] VP → Verb NP [0.8] | Modal Verb [0.2] Noun → can [0.1] | fish [0.3] | . . . Pronoun → I
Select five sentences and submit them to an online translation service. Translate them from English to another language and back to English. Rate the resulting sentences for grammaticality and
Consider the following simple PCFG for noun phrases: 0.6 : NP → Det AdjString Noun 0.4 : NP → Det NounNounCompound 0.5 : AdjString → Adj AdjString 0.5 : AdjString →
Zipf ’s law of word distribution states the following: Take a large corpus of text, count the frequency of every word in the corpus, and then rank these frequencies in decreasing order. Let fI be
Without looking back at Exercise 23.TXUN, answer the following questions:a. What are the four steps that are mentioned? b. What step is left out? c. What is “the material” that is
In this exercise you will develop a classifier for authorship: given a text, the classifier predicts which of two candidate authors wrote the text. Obtain samples of text from two different authors.
This exercise concerns the classification of spam email. Create a corpus of spam email and one of non-spam mail. Examine each corpus and decide what features appear to be useful for classification:
Some linguists have argued as follows:Children learning a language hear only positive examples of the language and no negative examples. Therefore, the hypothesis that “every possible sentence is
Run a notebook such as www.tensorflow.org/hub/tutorials/tf2_text_ classification that loads a pre-trained text embedding as the first layer and does transfer learning for the domain, which in this
Choose a dataset from paperswithcode.com/task/question-answering and report on the NLP Question-Answering model that performs best on that dataset. It will be easier if you choose a dataset for which
Run a word embedding visualization tool such as projector.tensorflow.org/ and try to get a feel for how the embeddings work: what words are near each other? Are there surprises for unrelated words
So far we’ve concentrated on word embeddings to represent the fundamental units of text. But it is also possible to have a character-level model. Read Andrej Karpathy’s 2015 article The
This exercise is about the difficulty of translation, but does not use a large corpus, nor any complex algorithms–just your own ingenuity. A rare language spoken by only about a hundred people has
Since the publication of the textbook, a new architecture called Perceiver was introduced by Jaegle et al. in their article Perceiver: General Perception with Iterative Attention arxiv.
Run a notebook such as www.tensorflow.org/tutorials/text/word2vec that learns word embeddings from a corpus using the skip-gram approach. Train the model on a corpus of Shakespeare, and separately on
Experiment with a large-scale NLP text generation system. Pretrained online versions come and go; you could try 6b.eleuther.ai/ or transformer.huggingface. co/doc/distil-gpt2 or search for another
Run a notebook to generate a word embedding model such as www.tensorflow. org/text/guide/word_embeddings, which trains an embedding model based on a corpus of IMDB movie reviews. Create the embedding
Which of the following are true or false? a. A RNN is designed specifically for processing word sequences. b. A RNN is designed to process any time-series data. c. An RNN has a limit
Experiment with online sequence-to-sequence neural machine translation model, such as translate.google.com/ or www.bing.com/translator or www.deepl. com/translator. If you know two languages well,
Run a tutorial notebook such as www.tensorflow.org/text/tutorials/ transformer to train a Transformer model on a bilingual corpus to do machine translation. Test it to see how it performs. How does
Read medium.com/@melaniemitchell.me/can-gpt-3-make-analogies-16436Melanie Mitchell’s account of trying to replicate her 1980s work on analogy-making with a standard GPT-3 model. Mitchell’s 1980s
Examine how well word embedding models can answer analogy questions of the form “A is to B as C is to [what]?” (e.g. “Athens is to Greece as Oslo is to Norway”) using vector arithmetic.
Apply an RNN to the task of part of speech tagging. Run a notebook (such as www.kaggle.com/tanyadayanand/pos-tagging-using-rnn or github. com/roemmele/keras-rnn- notebooks/tree/master/pos_tagging),
Besides machine translation, describe some other tasks that can be solved by sequenceto-sequence models.
We have considered recurrent models that work a word at a time, and models that work a character at a time. It is also possible to use a subword representation, in which, say, the word
Apply an RNN to the task of text classification, in particular binary sentiment analysis: classifying movie reviews on IMDB as either positive or negative. Run a notebook such as
Which of the following are true assertions about the variable elimination algorithm, and which are false? a. When changing a Bayes net by removing a parent from a variable, the maximum factor
Exercise 13.MRBL askes you to prove that removing an observed variable Y from a Bayes net has no effect on the posterior disribution of any variable X that is outside Y ’s Markov blanket, provided
Alice, Bob, Carol, and Dave are being given some money, but they have to share it in a very particular way: • First, Alice will be given an integer number of dollars A, chosen uniformly at
We are running Gibbs sampling in the Bayes net shown in Figure S13.47 for the query P(B, C | + h, +i, +j). The current state is +a, +b, +c, +d, +e, +f, +g, +h, +i, +j, +k. Write out an expression for
For the graphs in Figure S13.3, what is the minimal set of edges that must be removed such that the corresponding independence relations are guaranteed to be true? Figure S13.3 E (a) A
For the following Bayes nets, add the minimal number of arrows such that the resulting structure is able to represent all distributions that satisfy the stated independence and nonindependence
Cheating dealers have become a serious problem at the mini-Blackjack tables. A miniBlackjack deck has 3 card types (5,10,11) and an honest dealer is equally likely to deal each of the 3 cards. When a
In the Bayes net in Figure S13.14, state whether each of the following assertions is necessarily true, necessarily false, or undetermined. a. B is absolutely independent of C. b. B is
In the Bayes net in Figure S13.15, state whether each of the following assertions is necessarily true, necessarily false, or undetermined.a. A is absolutely independent of C. b. A is
In the Bayes net in Figure S13.16, which of the following are necessarily true? a. P(X, Y, Z) = P(X)P(Y |X)P(Z|X, Y). b. P(X, Y, Z) = P(X)P(Y |X)P(Z|Y). c. P(X, Y, Z) = P(X)P(Y
You are given the following conditional distributions that connect the binary variables W, X, Y , Z: Which of the Bayes nets in Figure S13.17 can represent a joint distribution that is
Label the blank nodes in the Bayes net below with the variables {A, B, C, E} such that the following independence assertions are true: • A is conditionally independent of B given D,
a. Consider answering P(H | + f) by variable elimination in the Bayes nets N and N' shown in Figure S13.44, where the elimination order is alphabetical and all variables are binary. How large are the
Consider the Bayes net shown in Figure S13.12.a. Which of the following are asserted by the network structure? (i) P(B, I, M) = P(B)P(I)P(M). (ii) P(J | G) = P(J | G, I). (iii) P(M |
In the Bayes net in Figure S13.13, state whether each of the following assertions is necessarily true, necessarily false, or undetermined. a. A is absolutely independent of E. b. B is
Consider doing inference in an m x n lattice Bayes net, as shown in Figure S13.43. The network consists of mn binary variables Vi,j , and you have observed that Vm,n = +vm,n.You wish to calculate
The probit distribution defined on page 424 describes the probability distribution for a Boolean child, given a single continuous parent. a. How might the definition be extended to cover
a. Consider the Bayes net in Figure S13.19. (i) Given B, what variable(s) is E guaranteed to be independent of? (ii) Given B and F, what variable(s) is G guaranteed to be independent of?b.
Which of the following are true, and which are false? a. Bayes nets are organized into layers with connections only between adjacent layers. b. The topology of a Bayes net can assert that
Consider the Bayes net below in Figure S13.22 with 9 variables: a. Which random variables are independent of X3,1?b. Which random variables are conditionally independent of X3,1 given
Assume we are given the ten Bayes nets in Figure S13.26, labeled G1 to G10. Assume we are also given the three Bayes nets in Figure S13.27, labeled B1 to B3. a. Assume we know that a joint
There has been an outbreak of mumps in your college. You feel fine, but you’re worried that you might already be infected. You decide to use Bayes nets to analyze the probability that you’ve
Suppose that an object is moving according to the following transition model: Here, 0 < p < 1 and 0 < q < 1 are arbitrary probabilities. At time 0, the object is known to be in state
Let P be a probability distribution over random variables A, B, C. Let Q be another probability distribution over the same variables, defined by a Bayes net in which B and C are conditionally
Consider a Markov chain with 3 states and transition probabilities as shown below: Compute the stationary distribution. That is, compute P∞(A), P∞(B), P∞(C).
In which of the Bayes nets in Figure S13.25 does the equation P(A, B)P(C) = P(A)P(B, C) necessarily hold?Figure S13.25 A A B B A B В A В A B
Consider the vacuum worlds of Figure 4.18 (perfect sensing) and Figure 14.7 (noisy sensing). Suppose that the robot receives an observation sequence such that, with perfect sensing, there is exactly
For the Bayes net structures in Figure S?? and Figure S?? that are missing a direction on their edges, assign a direction to each edge such that the Bayes net structure implies the stated conditional
Transportation researchers are trying to improve traffic in the city but, in order to do that, they first need to estimate the location of each of the cars in the city. They need our help to model
Assume the elevator of the Disney Tower of Terror E follows a Markovian process and has m floors at which it can stop. In the dead of night, you install a sensor S at the top of the shaft that gives
Computing the evidence likelihood L1:t = P(e1:t) in a temporal sequence can be done using a recursive computation similar to the filtering algorithm. Show that the likelihood message ℓ1:t(Xt) =
Consider two particle filtering implementations: Implementation 1: Initialize particles by sampling from initial state distribution and assigning uniform weights. 1. Propagate particles,
Consider the Bayes net obtained by unrolling the DBN in Figure 14.20 to time step t.Use the conditional independence properties of this network to show that P(Dirt 1,0.t..., Dirt 42,0:t | DirtSensor
In California, whether it rains or not from each day to the next forms a Markov chain (note: this is a terrible model for real weather). However, sometimes California is in a drought and sometimes it
(iv) [true or false] With a deterministic transition model and a stochastic observation model, as time goes to infinity, when running a particle filter we will end up with all identical particles.(v)
In which settings is particle filtering better than exact HMM inference? • Large vs. Small state spaces. • Prioritizing runtime vs. accuracy.
Consider an HMM with state variables {Xi} and emission variables {Yi}. (i) [True or false] Xi is always conditionally independent of Yi+1 given Xi+1. (ii) [True or false] There exists an
Consider a probability model P(X, Y, Z, E), where Z is a single query variable and evidence E = e is given. A basic Monte Carlo algorithm generates N samples (ideally) from P(X, Y, Z | E = e) and
Equation (17.11) shown below states that the Bellman operator is a contraction. a. Show that, for any functions f and g,b. Write out an expression for |(B Ui − B U'i)(s)| and then apply the
In this exercise we explore the application of UCT to Tetris. a. Create an implementation the Tetris MDP as described in Figure 17.5. Each action simply places the current piece in any reachable
Value iteration: (i) Is a model-free method for finding optimal policies. (ii) Is sensitive to local optima. (iii) Is tedious to do by hand. (iv) Is guaranteed to converge when
a. Please indicate if the following statements are true or false. (i) Let A be the set of all actions and S the set of states for some MDP. Assuming that |A| << |S|, one iteration of value
Pacman finds himself inside the grid world MDP depicted in Figure S17.5. Each rectangle represents a possible state. At each state, Pacman can take actions up, down, left or right. If an action moves
Please indicate whether the following statements are true of false a. If the only difference between two MDPs is the value of the discount factor then they must have the same optimal policy. b.
Consider an (N + 1) × (N + 1) × (N + 1) cubic gridworld. Luckily, all the cells are empty – there are no walls within the cube. For each cell, there is an action for each adjacent facing open
Suppose we run value iteration in an MDP with only non-negative rewards (that is, R(s, a, s') ≥ 0 for any (s, a, s')). Let the values on the kth iteration be Vk(s) and the optimal values be V∗
Recall that a weighted voting game is a cooperative game defined by a structure G = [q; w1, . . . , wn] where q is the quota, the players are N = {1, . . . , n}, the value wi is the weight of player
Consider the following deterministic MDP with 1-dimensional continuous states and actions and a finite task horizon:State Space S: RAction Space A: RReward Function: R(s, a, s') = −qs2 − ra2
Suppose we are given a cooperative game G = ({1, 2}, v) with characteristic function v defined by:Show that weighted voting games cannot capture this “singleton” game: we will not be able to find
Define the following in your own words: a. Multiagent system b. Multibody planning c. Coordination problem d. Agent design e. Mechanism design f. Cooperative game
Give some examples, from movies or literature, of bad guys with a formidable army (robotic or otherwise) that inexplicably is under centralized control rather than more robust multiagent control, so
In the Landowner and Workers game there is a landowner ℓ and n workers w1, . . . , wn. A group of workers may lease the land from the landowner and grow vegetables on it. Their productivity depends
Consider the following scenario:Five pirates wish to divide the loot of a 100 gold pieces. They are democratic pirates, in their own way, and it is their custom to make such divisions in the
In the game of football (“soccer” in the US), a player who is awarded a penalty kick scores about 3/4 of the time. Suppose we model a penalty kick as a game between two players, the shooter, S,
Consider the following scenario:Two players (N = {1, 2}) must choose between three outcomes Ω = {a, b, c}. The rule they use is the following: Player 1 goes first, and vetoes one of the
Consider a 2 player game in which player 1 can choose A or B. The game ends if she chooses A, while it continues to player 2 if he chooses B. Player 2 can then choose C or D with the game ending if C
Consider the following scenario:There are two pirates operating among three islands A, B, and C. On each island, two treasures are buried: a large one worth 2 and another smaller one worth 1. The
Avi and Bailey are friends, and enjoy a night out together in the pub. They each will independently decide to go either to the Turf or the Rose.Avi mildly prefers the Rose over the Turf and would get
Define the following machine-learning terms in your own words a. Training setb. Hypothesisc. Biasd. Variance
Indicate which of answer(s) in parentheses are correct for each question: a. For binary class classification, does logistic regression always produce a linear decision boundary? (Yes;

Showing 100 - 200 of 303