|
|
Dynamic
Programming Solver |
|
Worksheet Details
|
|
During the discussion of the DP Solver features we have deliberately
hidden worksheet columns to simplify descriptions. These columns
contain a variety of formulas for implementing the solver functions.
The student using the add-ins need not be concerned about the
computational columns except to follow the rule to leave
these columns alone. The DP Solver worksheet has a button labeled Equations. The
purpose of this button is to rewrite the formulas after changes
have been made.
There is a difference between worksheets constructed through
the DP Models add-in and those constructed using the Build
Model command on the DP Solver menu. The worksheets created
by the DP Models add-in colors fewer columns yellow because
some of the computations are accomplished during the model
building. The Build Model command fills some of the
columns with formulas to simplify building small problems.
In some limited cases, described on this page, these yellow
columns can be changed (with care).
The worksheet depends on the problem being solved and on the
option selected for the display of data. The problem being
solved can be a discrete time Markov Chain, a MDP problem or
a deterministic DP model. The example on this page shows the
MDP problem. The data option on this page shows the transition
probabilities using a matrix format. The matrix format allows
the transition costs to be shown as a matrix or represented
by a column of expected transition costs. The latter format
is illustrated here. |
The DP Model |
|
|
We illustrate the DP worksheet with several
figures. The cells of
the model are colored yellow, green or white. Only the
white cells hold data that can be directly controlled by
the user. The yellow cells hold formulas and values that
should not be replaced. The green cells hold values computed
by the VBA code.
The upper left cells on the page show the
problem parameters and intermediate results. The content
of the green fields are filled by the solution algorithms.
The bottom three fields can be changed directly by the
user. The values are reflected on the worksheet. Notice
that this example uses a discount rate of 1% per month. |
|
|
Columns to the right of the parameters
show the state, action, and event definitions. The state definition
is shown below with the solver buttons. Columns D through N
have state information. Each row, starting in row 14 holds
the information for a single state. Column D holds the state
indices, consecutive integers from 1 through the number of
states. Column E holds the state names assigned by the user.
The State Cost column
holds the direct cost of residing in each state. The value
of 2 in F14 is the cost of the new bulb. It is incurred whenever
the system is in the New state.
|
|
|
As indicated by their yellow color,
columns G through M hold quantities computed with Excel formulas.
They should not be disturbed. Column G holds the current decision
index. The index points to a row in the Decision table
of the worksheet that is discussed below. Column H
indicates the bottom of the range of the Decision table
that contains the decisions for the state. Column
I holds the action indices of the current policy. Column J
holds the corresponding action names. Column K holds the value
of the decision taken from the Decision table. The Step
Values in
column L are the immediate costs including state, action,
and expected transition costs. The State
Value in column M is the step cost plus the expected
NPW of costs of the remaining steps in the time
horizon. Column N holds the probabilities computed for the
states. It is green to indicate that the computer fills this
column using an algorithm rather than a formula. The figure
shows the initial values before any optimization. The goal
of the solver algorithm is to find the policy that minimizes
the values in column M.
As the solution algorithm progresses, the computer automatically
adjusts the contents of the state table. No interaction is
required by the user except to select the iteration method
and initial conditions through the Solve dialog.
Columns P through W hold action and event information. The Inspect action
expends the cost of inspection, $0.50. The Replace action
results in the cost savings of $0.30. An additional Null action
is provided and may be assigned as an option for some states.
The cost of the Null action is zero. The NA is
assigned when an action for some state is unavailable. The
cost for the NA is very large. Except for the names,
the event information plays no role for this example. The probabilities
for events are provided in the transition probability table. |
|
Decision List |
|
The rows on the worksheet beginning
in column Y and progressing to the right show the decision
list. A decision is a combination of state and action. The
table shows the data and computed results describing
states and actions. Each row shows one state and
one action. For example, row 14 describes state 1 (New)
and action 1 (Inspect). Column Y holds the indices
by which the rows are addressed. Column Z and AA show the state
and action indices respectively.
Column AB shows the names
of the state/action combinations. Columns
AC through AG hold computed values necessary for the algorithm.
Column AC shows the action costs. Column
AD holds the expected cost of the transition. When an action/reward
matrix is provided, this column is computed. Otherwise, as
in this example, it is entered as data. Column AE holds the
expected next value cost. It is computed as the Next Value in
row 8 multiplied by the row of the transition probability matrix.
Column AF holds the decision value computed as the sum
of the decision reward, the expected transition reward and
the discounted future value. Column AG holds the probability
of the decision. It is the Last Probability for the
state associated with the decision when the action is optimum
for the decision. |
|
|
Observe that each state has two
actions, to inspect or replace. Columns AH through AL hold
transition probabilities. These are important data inputs for
any stochastic MDP. Even-numbered rows from 14 through 22 hold
the transition probabilities associated with the inspection option.
Odd numbered rows from 15 through 23 hold the transition probabilities
associated with the replace option.
This option always brings the system to the New state.
The last column (AM in this case) holds the sum of the transition
probabilities for each row. The sum of transition probabilities
must be 1 for each row and all probabilities must be nonnegative.
The green range in row 8 holds the next-state value computed
by the iterative solution process. The yellow range in row
9 holds formulas that compute the state probabilities.
The two cells at the bottom of column AM compute the minimum
and maximum values of the probability sums. When these numbers
are different than 1, the probabilities are in error and a
solution algorithm will not continue. The Minimum
Probability cell
holds the minimum value of the all the numbers entered in the
probability range. If this number is negative, a negative probability
has been entered. This is not allowed.
|
|
|
|