|
|
Dynamic
Programming
Examples |
|
-
Baseball Model |
|
This example comes
from a book Dynamic Programming and Markov
Processes by Ronald A. Howard, (MIT Press, Cambridge,
Massachusetts, 1960). It is interesting because it describes
a decision process familiar to baseball fans. It embodies
difficult transition relations that reflect the rules of baseball.
The process describes a single inning of the game. The states
describe the number of outs and the occupation of the
bases. The process starts with no outs and no-one on base.
Decisions are instructions from a coach to the current batter
and base runners. The events are the result of the play. The
events are probabilistic. The inning passes through a series
of states and finally finishes with a state with three outs.
The model is constructed directly by the DP Models add-in and
solved with the DP Solver add-in. |
Creating the Model Form |
|
|
The model is constructed by choosing Add
Model from the DP Models add-in menu. The baseball problem
is a stochastic dynamic program. |
|
|
The next dialog accepts data describing
the form of the model. Once the model is constructed, the Name,
and the numbers of State
Variables, Action Variables and Event Variables
are fixed. The numbers of State Blocks, Decision
Blocks and Transition Blocks may be changed
during the modeling process.
On clicking OK, the entire worksheet is prepared. You should
download and open the baseball workbook to observe the
various parts discussed below.
The upper-left corner of the worksheet is shown below. The
parameters describing the form are in columns A and B. The
white fields in rows 11 through 14 can be changed. The yellow
cells should not be changed. The Change button in
column D allows changes in the numbers of blocks of the various
types.
The other buttons cause the states and other model elements
to be listed on a worksheet labeled BB, Lists. The Transfer
to DP Solver creates a solver model and translates the
data to the solver. |
|
|
States |
|
|
The state definition is the basis for all
models. The enumeration form on the left shows the state
definition for the baseball problem. There are four state
variables. The first indicates the number
of outs.The other three are the numbers of players on third,
second and first base respectively. The lower and upper
bounds in rows 18 and 19 determine the range of the state
variables. The last three variables are binary, indicating
that the base is either occupied or not. Cell E17 holds
a logical expression that restricts the states to be considered.
For the baseball problem, no logical conditions are specified
in this model, so cell E17 holds the default value TRUE.
The state space consists of all integer values within the
ranges. In this case there are 32 states in the state space.
Row 20 computes the name of the state variable and row
21 computes the component of the objective value that depends
on the state.
The objective of this problem is to maximize the number
of runs scored in the inning. The runs always occur because
of transitions, so the state does not affect the objective
directly. Row 21 has all zero entries. |
The add-in enumerates all states by advancing the index in
E14 through all integer values from 0 to 31. We will call this
the state count index. Formulas in row 14 use the
MOD function to compute the state values associated with the
index. The current state is in the range F14:I14. During the
modeling process it is often useful to manually change the
state. This is accomplished by entering a nonnegative integer
in cell E14 or clicking on the spinner control in column K.
Clicking the top arrow advances the state index by 1 and clicking
the bottom arrow reduces the state count index by 1. |
|
In addition to the state
vector enumeration structure a collection of state subsets can
be defined. The example uses two state subsets. The current state
is repeated in row 33 in columns
F through I for easy reference. Summary information is
shown at the right. Some columns of the worksheet are hidden
to simplify viewing. |
|
|
State subset 1, rows
36 through 41, identifies all the states with three outs. The
lower and upper bounds in F39 and F40 are
both equal to 3. With three outs, the
inning terminates. The Exit
Model cell
in X39 indicates a terminal state. For this subset, the model
has the value TRUE in X39 to indicate that the process terminates
when one of the states with three outs is reached. Cell X37 indicates
whether the current state is in this subset, and since the current
state has 0 outs, the value FALSE is returned. |
|
State subset 2, rows
42 through 47, identifies
all states with 2 or fewer outs. The current state (0, 0, 0,
0) satisfies this condition so the result is TRUE in cell X43.
The state summary at the top of the figure indicates that the
current state fulfills one of the state subset conditions (X33).
The number in X34 shows the subset index that contains the current
state. When a state falls into more than one subset, the nearest
subset to the top is reported. |
|
|
Clicking the List States button
at the top of the page, creates the list to the left.
States listed must satisfy the enumeration bounds and be
in at least one of the state subsets. All combinations
of outs and base runners are listed. The Final state
at the bottom of the list is added to accept terminating
transitions.
When determining the transitions for a model, a transition
is accepted only when it leads to a feasible state. The
list at the left shows all feasible states.
This list and all the other lists described on this page
on the worksheet named BB Lists. |
|
Actions |
|
|
The action for the baseball problem is the
advice given to the batter or base runner. A batter may instructed
to hit or bunt. The base runner can be
asked to steal second, third or home base.
The actions are enumerated with the form at the left. The
name in N20 is provided by a lookup formula referring to
the data table, described below.
An important restriction of this program is that the actions
and events for the model must be defined independent of
the state. Although all states do not allow all actions,
the action list must include all actions possible under
any state, |
|
|
|
Clicking the List Elements button
at the top of the page creates the Action List.
The table shows an index that varies from 1 to the number
of states, the count value that leads to each action, the
name, the action variables (only Bat for this problem), and
the number of runs contributed by the action. No runs are
caused directly by the action, so this column is 0. |
|
Events |
|
|
The baseball problem allows 14 events. The
name and probability entries on the enumeration form are
obtained by reference to the data table.
Again the events must be enumerable without reference
to the states and actions. The objective contribution and
the event probabilities may depend on the state and action.
In this case probability depends on the action chosen. |
|
|
|
The complete event list is shown to the
right. 1B, 2B, 3B and HR indicate possible hits that a
batter can make to advance the runners and possibly
score runs. BB is a base on balls. SO, FO and GO describe
kinds of outs: strike out, fly out and ground out. SAC
is a sacrifice that advances the runner, but the batter
is out. FC means fielder's choice. A fielder's choice causes
the lead runner to be out, but the batter takes first base.
ST-Suc means a successful steal, ST-Fail means that the
steal fails and the runner is out. ST-Stay means that the
runner neither advances nor is out and remains on the base.
Complex transitions occur from one state to another based
approximately on the rules of baseball. These are described
in the Transitions section of the worksheet. |
|
Data Table |
|
|
The table to the left is constructed to
hold names and probability data. This is the same data
used by Howard in his description of the baseball problem.
The model has equations that point to this data, so it
is easy to generate models for alternative data. Notice
that the probabilities depend both on the action and event. |
|
Decisions |
|
A decision is a combination
of a state and an action. The Decision Blocks of the
worksheet define the set of decisions for the dynamic programming
model. The
current decision shown at the top of the display below. The current
decision combines the state (0, 0, 0, 0) and the action Hit.
At the right of the display we see the name of the decision S_0_0_0_0/Hit in
X50. Cell X51 indicates that this decision is feasible (TRUE),
and X52 indicates that subset 5 of the decision subsets is feasible.
Notice block 5 starts in row 78 and represents all states with
0 to 2 outs and the decision Hit. Each block defines
a subset of the possible decisions. For example, block 1 starting
in row 54 holds the states that allow the action Steal 2.
These states all have 2 or fewer outs, second base is empty,
and first base is occupied. In like manner, I have represented
all feasible decision subsets.
When generating the model for dynamic programming, the add-in
will enumerate all combinations of the state and action lists.
If a decision is feasible (at least one of the blocks is satisfied)
it is added to the decision list. If a state has no feasible
decisions, the NA action is assigned to the decision and the
combination penalized with a large cost.
For
this example all state and action costs are zero, so we see
zero values in the Runs cell of column X. |
|
|
In general, several actions
may be feasible for a state and the collection of all feasible
decisions forms the decision list. Part of the decision list
for the example is shown below. There are 78 feasible decisions,
so only the first few and last few decisions are shown. For each
decision there is a state index and an action index. Notice that
there is only decision for state (0, 0, 0, 0) and that is the Hit action.
Three decisions are available in state (0, 0, 0, 1), no outs
and a man on first base, and the associated actions are Hit,
Bunt and Steal 2. The decision blocks the resultant
decision list depend on assumptions regarding appropriate actions
for certain states. For example, when no one is on base, the
Hit action is the only possibility, |
|
|
|
We see at the end of
the list, the states with three outs. The Null action with 0
cost is appended for states that go directly the final state. |
Transitions |
|
Transitions show the
movement from one state to the next, and is governed
by the state, action and event.
Some of the transition blocks are shown below. There are
26 blocks in my model for the baseball problem. Each block
describes a subset of the combinations of states, actions and
events. It is useful to divide the transitions into subsets
because it is impossible to write a single transition function
that conveys the complex logic of the baseball inning. Transitions
describe the function that leads the system from one state
to another. Each transition has an objective term and a probability.
The current state, action, and event combination is
given at the top of the display. The state is (0, 0, 0, 0),
the bases are empty and there are no outs. The action is to Hit.
The event is 1B, a one base hit. The name is in X92. X93 indicates
that some transition is feasible, and X94 shows the index of
the transition subset.
Block 1 describes a one base hit. The ranges on the state
variables in rows 99 and 100 show that this transition can
occur in any state with two or fewer outs. The logic and bounds
in columns M, N and O indicate that the action can be either
hit or bunt. Columns R, S and T show that only the 1B event
is modeled by this block.
The transition equation is given in the range F112: I112,
labeled Change State. The next state is in the range
labeled Next State. It is the sum of the current
state vector and the change state vector. The change state
vector for this case is (0, 0, 0, 1) and the new state is (0,
0, 0, 1). A base hit with no-one on base and no outs results
in a runner at first and no outs. The equations in the Change
State range are fairly complicated. They must represent
the change in state regardless of the pattern of runners on
the bases. For the !B event, all runners advance and the batter
goes to first base. When third base is occupied, a run scores.
The Run cell
in K102 has a formula that evaluates to 1 when third base is
occupied. This value is transferred to the summary column with
a formula. The probability, 0.15, comes directly from the event
probability data.
The formulas in the change state vector compute the
next state. For feasibility, the next state must appear in
the state list. When it does, the index number of the state
is in cell J104, The index is transferred to X100. |
|
|
to generate the complete list of
transitions, the program enumerates all combinations
of decisions (states and actions) and events. The result is
the Transition List shown in part below. Each transition
has an objective function contribution, shown as the Transition
Runs column
AH. The Transition
Probabilities are in column AJ. Only transitions with
nonzero probabilities are listed. The next state is described
by the index in AJ and by name in AK. There are 402 transitions
for the baseball problem. |
|
|
|
There are 26 transition blocks. Different
models are possible but we have attempted to divide the transitions
to make the transition functions as simple as possible. Still,
in some cases the logic is complex. |
|
Transition |
Explanation |
Logic |
1 |
One-base hit |
Advance the runners by one base and put the batter on first.
Score a run if the current state has a runner on third base. |
2 |
Two-base hit |
Advance the runners by one base and put the batter on second.
Score a run if the current state has a runner on third base.
Score another run if the current state has a runner on second
base. |
3 |
Three base hit |
Any runners on base score. Put the batter on third base. |
4 |
Home run |
All runners on base and the batter scores. |
5 |
Base on balls |
If first base is empty put the runner on first and leave
other runners in position. If first base is not empty advance
runners starting at first until an empty base is encountered.
If the bases are full, score a run. |
6 |
Strike or foul out |
The batter is out. All runners stay fixed. Outs increase
by 1. |
7 |
Fly out, one or less outs |
The batter is out. A runner on third scores, but all other
runners stay fixed. Outs increase by 1. |
8 |
Ground out with one or less outs |
The batter is out. Other runners advance. A runner on third
scores. Outs increase by 1. |
9 |
Double play ball with bases empty. |
Only one out is recorded because there is no second runner
on base. Outs increase by 1. |
10 |
Double Play-No outs/Nearest runner to home on first |
Runner on first
and batter are out, other runners do not advance. Outs increase
by 2. |
11 |
Double Play-No outs/Nearest runner to home on second |
Runner on first second and batter are out,
other runners do not advance. Outs increase by 2. |
12 |
Double Play-No outs/Nearest runner on Third |
Both runner
and batter are out. Outs increase by 2. |
13 |
Double Play with one out and at least on runner |
Two outs are recorded and the inning is over.
Note that the Exit Model cell is set to TRUE for this transition.
The logic regarding at least one on base is set by a logical
expression in the State Logic cell. |
14 |
Sacrifice |
All runners on base advance one base. The
batter is out. A runner on third base scores. Outs increase
by 1. |
15 |
Fielder's Choice with bases empty |
The batter is out. Outs increase by 1. |
16 |
Fielder's Choice with runner on first and other bases empty |
The runner on first is out. Batter takes first.
Outs increase by 1. |
17 |
Fielder's Choice with the furthest runner from the plate
on second |
The runner on second is out. A runner on first
advances if there is one. Batter takes first. Outs increase
by 1. |
18 |
Fielder's Choice with the furthest runner from the plate
on third |
The runner on third is out, the other runners
advance and batter takes first. Outs increase by 1. |
19 |
Attempted Steal of second base succeeds |
The runner on first advances to second base.
The others do not advance. |
20 |
Attempted Steal of second base fails |
The runner on first is out. Other runners
do not advance. Outs increase by 1. |
21 |
Attempted Steal of third base succeeds |
The runner on second advances to third base.
The others do not advance. |
22 |
Attempted Steal of second base fails |
The runner on second is out.
The others do not advance. Outs increase by 1. |
23 |
Attempted Steal of home succeeds |
The runner on third scores.
The others do not advance. |
24 |
Attempted Steal of home fails |
The runner on third is out. The others do
not advance. Outs increase by 1. |
25 |
Any steal attempt leaves bases unchanged |
The current state remains the same |
26 |
Any other decision and results with two outs |
The outs increase by 1 and the inning is complete. |
|
The Complete DP Model |
|
The State, Action, Event, Decision
and Transition lists comprise the complete DP model for this
problem. Clicking the Transfer to DP Solver button creates
the lists. builds a DP Solver form, and transfers the model lists
to the DP Solver form. The DP Solver model for this problem is
on the next page. |
Building and Debugging the Model |
|
The form for a model is easily constructed
with the Add Model from the DP Models add-in menu. The
structure dialog accepts the numbers of state, action and event
variables. Once the model is constructed these values are fixed.
The dialog also specifies the numbers of state, decision and
transition blocks. The numbers of blocks can be changed through
the Change button, however, there must be at least one
block created in the initial model in order to add additional
ones.
The hard part of the baseball model was creating the transition
equations. Rules for advancing runners in the event of a
hit and adjusting runners for a fielder's choice or double play
require fairly complex logical expressions in the Change
State region of the transition definitions. While debugging
the model it is useful to create two panes on the worksheet.
Then the spinners on the element definitions can be easily changed
to verify the equations in the various blocks. Careful review
of the decision and transition lists may reveal errors. |
|
|
|