All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Hire a Tutor
AI Study Help
New
Search
Search
Sign In
Register
study help
business
management and artificial intelligence
Questions and Answers of
Management And Artificial Intelligence
Exercise 11.11 In SARSA with linear function approximators, if you use linear regression to minimize r + γQw(s, a) − Qw(s, a), you get a different result than we have here. Explain what you get
Exercise 11.10 Suppose your friend presented you with the following example where SARSA(λ) seems to give unintuitive results. There are two states, A and B. There is a reward of 10 coming into state
Exercise 11.9 Consider four different ways to derive the value of αk from k in Qlearning(note that for Q-learning with varying αk, there must be a different count k for each state–action pair).i)
Exercise 11.8 Compare the different parameter settings for the game of Example 11.8 (page 464). In particular compare the following situations:(a) α varies, and the Q-values are initialized to
Exercise 11.7 For the plot of the total reward as a function of time as in Figure 11.12 (page 474), the minimum and zero crossing are only meaningful statistics when balancing positive and negative
Exercise 11.6 Explain how Q-learning fits in with the agent architecture of Section 2.2.1 (page 46). Suppose that the Q-learning agent has discount factor γ, a step size of α, and is carrying out
Exercise 11.5 Explain what happens in reinforcement learning if the agent always chooses the action that maximizes the Q-value. Suggest two ways to force the agent to explore.
Exercise 11.4 Suppose a Q-learning agent, with fixed α and discount γ, was in state 34, did action 7, received reward 3, and ended up in state 65. What value(s)get updated? Give an expression for
Exercise 11.3 Give an algorithm for EM for unsupervised learning [Figure 11.4(page 457)] that does not store an A array, but rather recomputes the appropriate value for the M step. Each iteration
Exercise 11.2 Suppose the k-means algorithm is run for an increasing sequence of values for k, and that it is run for a number of times for each k to find the assignment with a global minimum error.
Exercise 11.1 Consider the unsupervised data of Figure 11.1 (page 454).(a) How many different stable assignments of examples to classes does the kmeans algorithm find when k = 2? [Hint: Try running
• In reinforcement learning, an agent should trade off exploiting its knowledge and exploring to improve its knowledge.
• A Markov decision process is an appropriate formalism for reinforcement learning. A common method is to learn an estimate of the value of doing each action in a state, as represented by the
• Missing values in examples are often not missing at random. Why they are missing is often important to determine.
• The probabilities and the structure of belief networks can be learned from complete data. The probabilities can be derived from counts. The structure can be learned by searching for the best
• EM is an iterative method to learn the parameters of models with hidden variables (including the case in which the classification is hidden).
• By introducing payments, it is possible to design a mechanism that is dominant-strategy truthful and economically efficient.
• Agents can learn to coordinate by playing the same game repeatedly, but it is difficult to learn a randomized strategy.
• A Nash equilibrium is a strategy profile for each agent such that no agent can increase its utility by unilaterally deviating from the strategy profile.
• In partially observable domains, sometimes it is optimal to act stochastically.
• Perfect information games can be solved by backing up values in game trees or searching the game tree using minimax with α-β pruning.
• A multiagent decision network models probabilistic dependency and information availability.
• The extensive form of a game models agents’ actions and information through time in terms of game trees.
• The strategic form of a game specifies the expected outcome given controllers for each agent.
• A multiagent system consists of multiple agents who can act autonomously and have their own utility over outcomes. The outcomes depend on the actions of all agents. Agents can compete, cooperate,
• A dynamic decision network allows for the representation of an MDP in terms of features.
• A fully observable MDP can be solved with value iteration or policy iteration.
• An MDP can represent an infinite stage or indefinite stage sequential decision problem in terms of states.
• A decision network can represent a finite stage partially observable sequential decision problem in terms or features.
• Utility is a measure of preference that combines with probability.
• Different planning algorithms can be used to convert a planning problem into a search problem.
• An action is a function from a state to a state. A number of representations exploit structure in representation of states. In particular, the feature-based representation of actions represents
• Planning is the process of choosing a sequence of actions to achieve a goal.
• An agent can choose the best hypothesis given the training examples, delineate all of the hypotheses that are consistent with the data, or compute the posterior probability of the hypotheses
• Linear classifiers, decision trees, and Bayesian classifiers are all simple representations that are the basis for more sophisticated models.
• Given some training examples, an agent builds a representation that can be used for new predictions.
• Supervised learning is the problem that involves predicting the output of a new input, given a set of input–output pairs.
• Learning is the ability of an agent improve its behavior based on experience.
• A hidden Markov model or a dynamic belief network can be used for probabilistic reasoning in time, such as for localization.
• Stochastic simulation can be used for approximate inference.
• Exact inference can be carried out for sparse graphs (with low treewidth).
• A Bayesian belief network can be used to represent independence in a domain.
• The posterior probability is used to update an agent’s beliefs based on evidence.
• Probability can be used to make decisions under uncertainty.
• A causal model predicts the effect of an intervention.
• Abduction can be used to explain observations.
• Negation as failure can be used when the knowledge is complete (i.e., under the complete knowledge assumption).
• Proof by contradiction can be used to make inference from a Horn clause knowledge base.
• A sound and complete proof procedure can be used to determine the logical consequences of a knowledge base.
• Given a set of facts about a domain, the logical consequences characterize what else must be true.
• A definite clause knowledge base can be used to specify atomic clauses and rules about a domain when there is no uncertainty or ambiguity.
• Optimization can use systematic methods when the constraint graph is sparse. Local search can also be used, but the added problem exists of not knowing when the search is at a global optimum.
• Stochastic local search can be used to find satisfying assignments, but not to show there are no satisfying assignments. The efficiency depends on the trade-off between the time taken for each
• Arc consistency and search can often be combined to find assignments that satisfy some constraints or to show that there is no assignment.
• Many problems can be represented as a set of variables, corresponding to the set of features, domains of possible values for the variables, and a set of hard and/or soft constraints. A solution
• Instead of reasoning explicitly in terms of states, it is almost always much more efficient for an agent solving realistic problems to reason in terms of a set of features that characterize a
• When graphs are small, dynamic programming can be used to record the actual cost of a least-cost path from each node to the goal, which can be used to find the next arc in an optimal path.
• Iterative deepening and depth-first branch-and-bound searches can be used to find least-cost paths with less memory than methods such as A∗, which store multiple paths.
• A∗ search can use a heuristic function that estimates the cost from a node to a goal. If this estimate underestimates the actual cost, A∗ is guaranteed to find a least-cost path first.
• Breadth-first and depth-first searches can find paths in graphs without any extra knowledge beyond the graph.
• Many problems can be abstracted as the problem of finding paths in graphs.
• An intelligent agent requires knowledge that is acquired at design time, offline or online.
• Complex agents are built modularly in terms of interacting hierarchical layers.
• An agent has direct access not to its history, but to what it has remembered(its belief state) and what it has just observed. At each point in time, an agent decides what to do and what to
• Agents are situated in time and must make decisions of what to do based on their history of interaction with the environment.
• An agent is composed of a body and interacting controllers.
• Agents have sensors and actuators to interact with the environment.
• An agent system is composed of an agent and an environment.
• In choosing a representation, you should find a representation that is as close as possible to the problem, so that it is easy to determine what it is representing and so it can be checked for
• To know when you have solved a problem, an agent must have a definition of what constitutes an adequate solution, such as whether it has to be optimal, approximately optimal, or almost always
• To solve a problem by computer, the computer must have an effective representation with which to reason.
• A designer of an intelligent agent should be concerned about modularity, how to describe the world, how far ahead to plan, uncertainty in both perception and the effects of actions, the structure
• An intelligent agent is a physical symbol system that manipulates symbols to determine what to do.
• An agent acts in an environment and only has access to its prior knowledge, its history of observations, and its goals and preferences.
• Artificial intelligence is the study of computational agents that act intelligently.
Exercise 10.3 In the sequential prisoner’s dilemma (page 438), suppose there is a discount factor of γ, which means there is a probability γ of stopping at each stage. Is tit-for-tat a Nash
Exercise 10.2 In Example 10.12 (page 437), what is the Nash equilibrium with randomized strategies? What is the expected value for each agent in this equilibrium?
Exercise 10.1 For the hawk–dove game of Example 10.11 (page 436), where D > 0 and R > 0, each agent is trying to maximize its utility. Is there a Nash equilibrium with a randomized strategy? What
• By introducing payments, it is possible to design a mechanism that is dominant-strategy truthful and economically efficient.
• Agents can learn to coordinate by playing the same game repeatedly, but it is difficult to learn a randomized strategy.
• A Nash equilibrium is a strategy profile for each agent such that no agent can increase its utility by unilaterally deviating from the strategy profile.
• In partially observable domains, sometimes it is optimal to act stochastically.
• Perfect information games can be solved by backing up values in game trees or searching the game tree using minimax with α-β pruning.
• A multiagent decision network models probabilistic dependency and information availability.
• The extensive form of a game models agents’ actions and information through time in terms of game trees.
• The strategic form of a game specifies the expected outcome given controllers for each agent.
• A multiagent system consists of multiple agents who can act autonomously and have their own utility over outcomes. The outcomes depend on the actions of all agents. Agents can compete, cooperate,
• The leaves represent final outcomes and are labeled with a utility for each agent.
• Each internal node labeled with nature has a probability distribution over its children.
• Each arc out of a node labeled with agent i corresponds to an action for agent i.
• Each internal node is labeled with an agent (or with nature). The agent is said to control the node.
Exercise 9.17 Consider a grid world where the action “up” has the following dynamics:That is, it goes up with probability 0.8, up-left with probability 0.1, and up-right with probability 0.1.
Exercise 9.16 What is the main difference between asynchronous value iteration and standard value iteration? Why does asynchronous value iteration often work better than standard value iteration?
Exercise 9.15 Consider the following decision network:a) What are the initial factors. (You do not have to give the tables; just give what variables they depend on.)(b) Show what factors are created
Exercise 9.14 One of the decisions we must make in real life is whether to accept an invitation even though we are not sure we can or want to go to an event. The following figure represents a
Exercise 9.13 This is a continuation of Exercise 6.8 (page 278).(a) When an alarm is observed, a decision is made whether to shut down the reactor. Shutting down the reactor has a cost cs associated
Exercise 9.12 How can variable elimination for decision networks, shown in Figure 9.11 (page 393), be modified to include additive discounted rewards? That is, there can be multiple utility (reward)
Exercise 9.11 In a decision network, suppose that there are multiple utility nodes, where the values must be added. This lets us represent a generalized additive utility function. How can the VE for
Exercise 9.10 Consider a 5 × 5 grid game similar to the game of the previous question. The agent can be at one of the 25 locations, and there can be a treasure at one of the corners or no
Exercise 9.9 Consider a game world:The robot can be at one of the 25 locations on the grid. There can be a treasure on one of the circles at the corners. When the robot reaches the corner where the
Showing 2600 - 2700
of 4588
First
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Last