You have two slot machines, each of which has an array of 100 lights. The probability distribution
Question:
You have two slot machines, each of which has an array of 100 lights. The probability distribution of the reward from playing each machine is an unknown (and possibly machine-specific) function of the pattern of lights that are currently lit up. Playing a slot machine changes its light pattern in some well-defined but unknown way. Discuss why this problem is more difficult than the multi-armed bandit problem. Design a deep learning solution to optimally choose machines in each trial that will maximize the average reward per trial at steady-state.
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Question Posted: