In a Markov decision problem, another criterion often used, different than the expected average return per unit

Question:

In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a number

a, 0 < a < 1, and try to choose a policy so as to maximize E[o a'R(X,, a)]. (That is, rewards at time n are discounted at rate a".) Suppose that the initial state is chosen according to the probabilities

b. That is, P(X = i) =

b, i = 1,..., n

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Answer rating: 100% (1 review)

Answered By

Somshukla Chakraborty

I have a teaching experience of more than 4 years by now in diverse subjects like History,Geography,Political Science,Sociology,Business Enterprise,Economics,Environmental Management etc.I teach students from classes 9-12 and undergraduate students.I boards I handle are IB,IGCSE, state boards,ICSE, CBSE.I am passionate about teaching.Full satisfaction of the students is my main goal. I have completed my graduation and master's in history from Jadavpur University Kolkata,India in 2012 and I have completed my B.Ed from the same University in 2013. I have taught in a reputed school of Kolkata (subjects-History,Geography,Civics,Political Science) from 2014-2016.I worked as a guest lecturer of history in a college of Kolkata for 2 years teaching students of 1st ,2nd and 3rd year. I taught Ancient and Modern Indian history there.I have taught in another school in Mohali,Punjab teaching students from classes 9-12.Presently I am working as an online tutor with concept tutors,Bangalore,India(Carve Niche Pvt.Ltd.) for the last 1year and also have been appointed as an online history tutor by Course Hero(California,U.S) and Vidyalai.com(Chennai,India).

4.00+ 2+ Reviews 10+ Question Solved

Related Book For book-img-for-question