Question

1 Approved Answer

Posted on Sep 13, 2024

5. Programming Project: In this programming assignment you will write a Python script to model the ski-shop inventory problem, and find the optimal decisions using

image text in transcribed

5. Programming Project: In this programming assignment you will write a Python script to model the ski-shop inventory problem, and find the optimal decisions using various methods. Description: A sports shop can buy and stock skis from the beginning of October to the beginning of March. The shop has capacity to hold up to 100 skis in its storage. Each ski bought costs $500 wholesale value, and sells for $700 retail. At the beginning of each month the owner must decide how many skis to buy (making sure the amount he buys and the amount already left in the storage, together not to exceed the capacity of 100.) At the end of March, the shop owner runs a fire sale and sells all unsold skis for $300. Here are more detailed information: - The demand for skis each month follows the Poisson distribution with the probability mass function f(d)=d!dexp(), where d is the demand in the month (an integer from 0 to ), and is the average number of skis sold in each month. The values of these averages are as follows: - The set of states is indexed by month-inventory pairs, so, for instance, (Dec, 53) is a state indicating that at the end of the month of December we have 53 skis in the storage. All states of the month of March are terminal states. We also add a single state of ( Sep, 0) as our start state, and which indicates the end of September/beginning of October which no skis are yet available. - The set of actions for each month consists of the amount of skis purchased at the beginning of the month. Note, however, that these skis purchased are not part of the skis inherited from the previous month. For instance, if at the end of November we have 53 skis, and we decide on the first day of December to buy another twenty, the state entered is (Dec, 53); the new twenty skis is accounted for January. Since there cannot be more than 100 skis in the storage, this actions includes {0,1,,100C} where C is the number of skis in the storage at the beginning of the month. (For example if at the beginning of November we have 60 skis in the storage, the actions available to us at that time is to buy, zero, or one, or ..., up to 40 new skis.) - The demand for skis is a random variable following the Poisson distribution as given above. If the demand exceeds what is available in the storage, the remaining demand in that month is left unanswered. - The reward in each month is the difference between the money paid to buy the skis at the beginning of the month, and the discounted money for the skis sold buy the end of the month. To make matters easy, we assume that if customers buy skis at some time during a month, their payment is deposited only at the end of the month. Also for sales of the next month a fixed discount factor of must be applied. Assume =0.90. Thus, if we decide to buy A skis at the beginning of a month, and sell D skis during the month, our reward will be 700D500A dollars. (Of course provided D is not larger than what we have in the inventory. Otherwise, the smaller of demand D and available inventory is what is sold. For example, if we have 20 skis at the beginning of January and order another 20, and the demand was D=45, our reward for January would be only 7004050020). Note that at the end of March, whatever is left in the inventory will be sold at the price of $300. These skis are also assumed to be sold on the last day of March and, thus, are not discounted. - Transitions, obviously, can be made only from states in one month to the states in the next, that is, from one month-inventory pair to the next month-inventory pair based on how many skis we purchased (at the beginning of the month) and how many we sold during that month. For instance, if at the end of December we have 53 skis, and we order 20 new skis, and sell 25 during January, then we transition form state (Dec, 53) to (Jan, 48). Your task is to write a script to encode these states and actions and compute the optimal action(s) for each state based on iterative dynamic programming techniques discussed in class. (It is better, though not mandatory to define a class for the ski inventory problem, and encode actions and probability transitions the way examples posted are done.) 5a) Suppose the (possibly suboptimal) policy followed by the store owner is to order skis at the beginning of each month M randomly and uniformly chosen in the interval [5,21M] at the beginning of the month M. Compute the value of this particular policy for the beginning of season on first of October (that is for the start state (Sep, 0)). 5b) Implement and find the optimal policy for each state and plot it as function of the number of skis available at the beginning of that month. For each month M, the x-axis should be the number of items inherited from the previous month, and the y-axis should be the optimal policy for the state (M,x), the number of skis to order. (So, you need six separate graphs, one each for October through March.) Note: It may be easier to compute the state-action pairs and their optimal values or their values under a given policy. Also, notice that the structure of this MDP is acyclic. For instance, there is no state to transition to at the end of March. Suppose you add the set of states (Mar+, c) indicating c skis left at the end of the March. Then these states have a deterministic value and no actions are available. For each state (M,c) we can compute its state-action value by examining the state values of (M+1,c), since from any state with month M we can only transition to a state with month M+1. So we can compute values of states v(M,c), or value of state-action q((M,c),a) from the values of (M+1,c). Therefore, you can work backward from the ( Mar+, c) states and find the values of states in each month from the month following it, until you reach (Sep,0). c) Simulate 1000 runs following the optimal policy and report the average and standard deviation of the actual rewards collected. Compare this average to the optimal value of the start state your algorithm predicted