Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 15, 2024

Consider the following grid environment. Starting from any unshaded square, you can move up, down, left, or right. Actions are deterministic and always succeed (e.g.,

Consider the following grid environment. Starting from any unshaded square, you can move up, down, left, or right. Actions are deterministic and always succeed (e.g., going left from state 1 goes to state 0) unless they will cause the agent to run into a wall. The thicker edges indicate walls and attempting to move in the direction of a wall results in staying in the same square. Taking any action from the green target square (no. 5) earns a reward of +5 and ends the episode. Otherwise, each move is associated with some reward r e {-1,0, +1}. Assume the discount factor y = 1 unless otherwise specified. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

a. Define the reward r for all states (except state 5 whose rewards are specified above) that would cause the optimal policy to return the shortest path to the green target square (no. 5).

b. Using r from part (a), find the optimal value function for each square.

c. Does setting y = 0.8 change the optimal policy?

d. Define the reward r(s) = 0 for all states (except state 5 whose rewards are specified above). Assume y = 0.8 as in part (c). How would the value function change? How would the policy change? Explain why.

e. All transitions are even better now:

each transition now has an extra reward of 1 in addition to the reward you defined in (a).

Assume y = 0.8 as in part (c). How would the value function change? How would the policy change? Explain why.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Mobile Communications

Mobile Communications

Authors: Jochen Schiller

2nd edition

978-0321123817, 321123816, 978-8131724262

More Books

Students also viewed these Programming questions

Question

★★★★★

If the focal length of a lens is 3 centimeters and the image distance is 5 centimeters from the lens, what is the distance from the object to the lens?

Answered: 1 week ago

Question

★★★★★

Historical returns for the past three years for Stock B and the stock market portfolio are Stock B: 30 percent, 0 percent, 30 percent; market portfolio : 18 percent, 20 percent, 28 percent. The...

Answered: 1 week ago

Question

★★★★★

Discussion Board: Look Beyond the Book - Finding Other Nervous and Endocrine Disorders Overview In this module, you were introduced to diseases and disorders related to the endocrine and nervous...

Answered: 1 week ago

Question

★★★★★

John Woods' weekly gross earnings for the present week were $2,500. Woods has two exemptions. Using an $80 value for each exemption and the tax rate schedule below, what is Woods' federal income tax...

Answered: 1 week ago

Question

★★★★★

Mullins Corp. reported the following items on its June 30, 2011 trial balance and on its comparative trial balance one year earlier: Determine the June 30, 2011 cash and cash equivalents amount for...

Answered: 1 week ago

Question

★★★★★

Prove that if the principal points of a biconvex lens of thickness d l overlap midway between the vertices, the lens is a sphere. Assume the lens is in air.

Answered: 1 week ago

Question

★★★★★

What are the formulas for (a) ????????????????????? (b) ????????????????????? (c) ????????????? (d) ????2?

Answered: 1 week ago

Question

★★★★★

Barbieri Co. makes aluminum canoes. The companys June 2010 costs for material and labor were as follows: Material costs Janitorial supplies ................. $1,800 Chrome rivets to assemble canoes...

Answered: 1 week ago

Question

★★★★★

For Amazon years 2017, 2018, and 2019, calculate the cost of each capital component, after-tax cost of debt, cost of preferred, and cost of equity with the DCF method and CAPM method. What do you...

Answered: 1 week ago

Question

★★★★★

A portfolio consisting of Stocks 1 and has an expected return of 14%. What is the standard deviation of this portfolio given the information about Stocks 1 and 2 in Table 3.1? Table 3.1 Expected...

Answered: 1 week ago

Question

★★★★★

Sales Mix 900 450 Sales Mix and Break-Even Sales Data related to the expected sales of laptops and tablets for Tech Products Inc. for the current year, which is typical of recent years, are as...

Answered: 1 week ago

Question

★★★★★

In the mechanism shown in Fig. 2.53, the crank AB, 70 mm long, rotates clockwise at 110 rev/min. CD is 140 mm long. Link BD is 260 mm long and slides through a swivelling pin E at the lower end of...

Answered: 1 week ago

Question

★★★★★

At January 1 , 2 0 2 4 , Brant Cargo acquired equipment by issuing a four - year, $ 1 5 0 , 0 0 0 ( payable at maturity ) , 6 % note. The market rate of interest for notes of similar risk is 1 2 % ....

Answered: 1 week ago

Question

★★★★★

A simulation of a major traffic intersection is to be conducted, with the objectives of improving current traffic flow. Provide three iterations, in increasing order of complexity, of steps 1 and 2...

Answered: 1 week ago

Question

★★★★★

In which of the following situations a company needs to resort to the eDiscovery in an ECM system? Select one: a . The company needs to find the email addresses of its customers to send out the...

Answered: 1 week ago

Question

★★★★★

You run a restaurant, an event space and do catering for private events. You use a single, group-wide overhead allocation rate, with pricing at 145% of cost. You can make improvements by implementing...

Answered: 1 week ago

Question

★★★★★

Any assistance would be greatly appreicated Freese, Inc., is in the process of preparing the fourth quarter budget for 2016, and the following data have been assembled The company sells a single...

Answered: 1 week ago

Question

★★★★★

Explain the term "Equivalent Units". Why are they calculated in process costing? [4 Marks] [minimum 350 words]

Answered: 1 week ago

Question

★★★★★

The following data resulted from an experiment to study the effects of leaf removal on the ability of fruit of a certain type to mature (Fruit Set, Herbivory, Fruit Reproduction, and the Fruiting...

Answered: 1 week ago

Question

★★★★★

The article Human Lateralization from Head to Foot: Sex-Related Factors (Science, 1978: 12911292) reports for both a sample of right-handed men and a sample of right-handed women the number of...

Answered: 1 week ago

Question

★★★★★

The results of an experiment to assess the effect of crude oil on fish parasites are described in the article Effects of Crude Oils on the Gastrointestinal Parasites of Two Species of Marine Fish (J....

Answered: 1 week ago

Previous Question Next Question