|
Clicking the Transfer to
DP Solver button on the model worksheet causes the DP
Solver worksheet to be constructed to hold the information
included in the lists derived by the enumeration process. The
upper left corner of the solver worksheet is shown below. The
worksheet stores the final values in column H. This example
illustrates the use of the final value column with
finite iterations. |
|
|
|
We have chosen to represent the transition
probabilities in a matrix. The beginning of the transition probability
matrix is shown in the three columns on the right of the figure
below. There are 20 columns in this matrix, one for each state, and
110
rows, one for each decision. |
|
Solution |
|
|
This is a finite horizon problem since the gambler plays
a fixed number of times. We choose to evaluate 15 plays.
With finite horizon, only the value method of DP
analysis is appropriate. The Policy method is only
useful for infinite horizon problems. We choose to maximize
the total value. |
|
A finite horizon problem should occasion the need to
show results from the individual steps of the solution.
Otherwise only the final values remain on the DP worksheet
at the end of the iteration steps. When the Show All
Iterations
from the iteration form, the dialog at the left directs
the set of information to show on the States worksheet.
Here we choose to show the action name and the state values.
The display is automatically cleared when the Initialize buttons
are clicked on the iteration dialog. |
|
|
After 15 iterations the process stops
and the results of the final iteration are displayed on the solver
worksheet. Recall the DP problems are solved backward in time.
The results show the optimum solution 15 plays before the end
of the game. The optimum action is to bet $1 for all wealth values
between 2 and 19 except when the wealth is 11. There we bet $2.
At the wealths of 1 and 20, no more bets occur. |
|
|
Ross solves the analytic version
of this problem for continuous state and action space. The single
state variable, the gambler's wealth is nonnegative and unbounded
from above. The optimum solution is to place no bets if the
probability of winning is 0.5 or less. When the probability
of winning is more than 0.5, the optimum bet is:
Bet = Wealth*(p - q)
where p is the probability of winning and q is
the probability of losing.
This solution is valid for every stage. For this example,
the value of p is
0.6 and the value of q is 0.4, so the optimal bet
is 20% of the wealth.
The optimal bets for the range in wealth considered in this example
are listed below. It is important to note that the analytic solution
does not limit the total wealth to a finite value. Thus we see
the size of the bet increasing proportionally to the wealth.
In contrast the finite policy obtained with the
add-in is restricts wealth to a maximum of 20 and bets in integer
values. We review
the decisions as the steps go backward from the final reward.
Step 1 is one step before the gambler receives his payment.
The restricted bets in step 1 follow roughly the optimum for
the first 15 states, but rather than increasing for the higher
states, the bets decrease. The restricted problem does not
allow bets that might result in a wealth higher than 20. As
the number of steps to the end increase, the effect of the
boundary becomes more and more obvious even at the lower wealth
values. |
|
|
The effect of the boundaries become more pronounced
as we step backward in time. The final strategy is to bet only
$1 in states 1 through 19, except in state 11 where we bet
$2.
|
|
|
This effect is common when approximating continuous,
unbounded state and action space by the discretized, bounded model
necessary for computation. The analytic solution is powerful
but limited. It provides an simple, optimum policy good for all
time. It is limited because it only works for the LOG gambler.
If the value of the final wealth is other than ln(wealth), the solution
is not valid. |
The Effect of a Smaller State and Action Interval |
|
The DP Model add-in has a feature that allows
non-integral step sizes. A second version of the gambler model
is constructed with the Include Step Size box checked. |
|
|
|
The model is constructed with a new row that specifies
a step size, row 21. For the state and action element we use
a step size of 0.25 and modify the bounds to reflect the new
step size. The step for event remains the same, 1, because it
is an indicator variable for win and lose. |
|
|
The state and action lists are four times longer
than the corresponding lists with a step size of 1. The first
few values are shown below. |
|
|
The DP solution was obtained as before. We show
the first step back from the final payout and the 15th step.
The optimum continuous solution the optimal bet is 20% of the
wealth. This holds pretty well for the first step, but not so
well for the 15th step. The effects of the boundary on the finite,
discrete model becomes more and more apparent as the time horizon
increases. |
|
|
Summary |
|
The Log gambler problem is interesting because
it is a finite horizon problem where the only evaluation is the
wealth at the final state. It is also interesting because there
is a simple analytical solution. For computation, it is necessary
to discretize the state and action spaces and include finite
ranges. As seen by this example, the limiting boundaries do effect
the approximate solution. To reduce the effects, use smaller
step sizes and large upper bounds for the computational model.
Of course that increases the size of the model in terms of states
and decisions. |
|
|