Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

really struggling with value iteration and discount factor on these problems. please help me solve these with steps so that i can learn how to

really struggling with value iteration and discount factor on these problems.

please help me solve these with steps so that i can learn how to work them! thank you! image text in transcribed

image text in transcribed

image text in transcribed

Consider Pacman that uses MDPs to maximize his expected utility. In each environment: - Pacman has the standard actions (North, East, South, West) unless blocked by an outer wall - There is a reward of 1 point when eating the dot (for example, in the grid below, R(A,South,B)=1 ) - The game ends when the dot (blue cirele) is caten. (a) Consider the following grid where there is a single food pellet in the botiom right corner (B). The discount factor is 0.2. There is no living reward. The states are simply the grid location. cation. a) What is the optimal policy for each state? b) What is the optimal value for the state of being in the upper left comer (E)? Reminder: the discount factor is 0.2. c) Using value itention with the value of all states equal to zero at k=0, for which iteration k will Vk(F)=V(F), explain. Consider the following grid world MDP for the rest of this question. Shaded cells represent walls. In all states, the agent has available actions 1,,,. Performing an action that would transition to an invalid state (outside the grid or into a wall) results in the agent remaining in its original state. In states with an arrow coming out, the agent has an additional action EXIT. In the event that the EXIT aetion is taken, the agent receives the labeled reward and ends the game in the terminal state T. Unless otherwise stated, all other transitions receive no reward, and all transitions are deterministic. For all parts of the problem, assume that value iteration begins with all states initialized to zero, i.e,, V0(s)=0,s. Let the discount fictor be =0.25 for all following parts. a) Suppose that we are performing value iteration on the grid world MDP below. What is the optimal value of V(A) and V(B) ? Explain your answer (show how you compute them) b) After how many iterations k will we have Vk(s)=V(s) for all states s ? If it never occurs, write "never". Show your computation

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Parallel Computation Third International Acpc Conference With Special Emphasis On Parallel Databases And Parallel I/O Klagenfurt Austria September 1996 Proceedings Lncs 1127

Parallel Computation Third International Acpc Conference With Special Emphasis On Parallel Databases And Parallel I/O Klagenfurt Austria September 1996 Proceedings Lncs 1127

Authors: Laszlo Boszormenyi

1st Edition

3540616950, 978-3540616955

More Books

Students also viewed these Databases questions

Question

★★★★★

Contrast dollar return and percentage return. Be sure to identify which return is more useful when comparing investments?

Answered: 1 week ago

Question

★★★★★

Answered: 1 week ago

Question

★★★★★

Describe the evolving mobile platform, consumerization of IT, and cloud computing.

Answered: 1 week ago

Question

★★★★★

Tang Company accumulates the following data concerning raw materials in making one gallon of finished product: (1) Pricenet purchase price $2.30, freight-in $0.20, and receiving and handling $0.10....

Answered: 1 week ago

Question

★★★★★

really struggling with value iteration and discount factor on these problems. please help me solve these with steps so that i can learn how to work them! thank you! Consider Pacman that uses MDPs to...

Answered: 1 week ago

Question

★★★★★

Ratio analysis is an important component of evaluating company performance. It can provide great insights into how a company matches up against itself over time and against other players within the...

Answered: 1 week ago

Question

★★★★★

You own a real estate development/home construction company. Differentiate competitive (business-level) strategies and corporate-level strategies used at your company. In particular, what generic...

Answered: 1 week ago

Question

★★★★★

2, answer asap QUESTION 2 Calculate the productivity of the following operation: A local IT company of 100 employees is trying to understand their productivity in order to re-negotiate their current...

Answered: 1 week ago

Question

★★★★★

10. What is meant by a feed rate?

Answered: 1 week ago

Question

★★★★★

While most people acknowledge accounting as an essential part of the business world, accounting also incorporates skills that are used regularly in daily life. In this post, please identify at least...

Answered: 1 week ago

Question

★★★★★

Suppose that there are only three types of consumers and consider a one period insurance choice problem in which there is only one health plan, and it offers full insurance. Expected cost to the...

Answered: 1 week ago

Question

★★★★★

3. Are certain kinds or types of jobs better for job sharing? Do the characteristics of the job, the manager, or the employees sharing the job have the most important influence on the effectiveness...

Answered: 1 week ago

Question

★★★★★

1. From a careers perspective, why would two employees decide to job share?

Answered: 1 week ago

Question

★★★★★

2. What are the advantages and disadvantages of job sharing from the companys perspective?

Answered: 1 week ago

Previous Question Next Question