Return to Index
Operations Research Models and Methods
 
Computation Section
Subunit Dynamic Programming Data
 - Walk: Deterministic Dynamic Programming

For the deterministic dynamic programming model we drop the random from the title, because the moves are not random in this case. The walker must follow a path to a designated terminal node. The goal is to find the path that is least expensive. The problem might be more accurately called the grid-walk problem because the walker is restricted to the grid lines.

This model has no probabilities, but only costs. A single terminating state is provided by the data with a cost of -100. The goal of the optimization is to minimize the total solution cost. This is the shortest path tree problem, terminating at a single state. For the example, the final state is (3, 3). We have zeroed the balk costs, so only the move costs are relevant. Since the cost is negative at the terminal node (actually a reward), the solution will seek the set of shortest paths from every node to (3, 3). To illustrate a matrix presentation of the move costs, we have used the Build Data Table option. The numbers shown in the table were randomly generated. The model uses the tabular data rather than the move costs in column G.

 

States and Action

 

To create the DP Model, click on the Build Model button. Rather than the event element of the Markov Chain, the deterministic DP model has an action element. The decision moves the walker from one state to another. The state shown below is (2, 1) and the current action is to move south. The action element has a cost, but not a probability. The cost in cell M21 is a reference to the data table that uses the state index in E14 and the move index in K14 to select the appropriate cost.

 
 

The model has a new component, the State Subset. We use the first subset to define the terminating state. The first subset is true only for the state (3, 3). This is designated as a terminal state, and the cost of encountering the state is -100. We use this negative cost to specify an award for reaching the terminal state. It is also defined as an Exit state. When this block is encountered during the enumeration, no transition is required because the transition is automatically set to the final state. It is necessary to include the second subset so that the other states will be included in the analysis.

 

 

Enumeration

 

The state list is similar to the Markov Chain model, but a final state (state 16) is included. Whenever one or more states are indicated as exit states, the final state will be added to the list.

The action list has four alternatives. The cost column is irrelevant to the model, since action costs are determined by a matrix. We will see the link costs as decision costs for this model.

 

Decisions

 

For the DDP problem, transitions are shown in the Decision List. A decision is described by a state and an action. The transition blocks are similar to other models of this class with the blocks of two types. The first four blocks describe balk or wrap transitions. A balk occurs when a move is taken in a direction restricted by a wall. With a balk the state does not change. This is illustrated by the first decision in the in the Decision List where the move north is restricted. The second four transition blocks describe moves that change the state. The second on the list is an example. The first 28 decisions are shown below.

The last few decisions on the list illustrate an exit state. Row 61 shows the transition to the final state with the associated transition cost. The final state is added as state 17.

When the wrap option is in chosen a movement toward the boundary is not limited. An action moving toward the boundary results in the walker moving to a state adjacent to the opposite boundary. If the wrap option had been selected for this example, the action of moving north from state (0,0) results in a transition to state (3,0).

 

Call Solver

 

The DDP model uses the DP Solver to find solutions. The solution is below with the optimum recovered for state 1.

Column L gives the optimum solution for each state as pictured below. The collection of actions is the optimum policy. The solution is the shortest path tree rooted at (3,3).

 

Summary

 

This page has demonstrated the DDP model for the grid walk problem. The DP Data add-in constructs a table holding the data. By clicking the Build Model button on the data worksheet, the DP Models add-in constructs the model worksheet and fills it with the constants and formulas that implement the model. By clicking the Transfer to DP Solver button, the DP Solver add-in creates the solution worksheet and transfers the information from the list worksheet. All three add-ins must be installed for all the steps to work.

The solution algorithm implemented by the Value Iteration method does not require that the solution determine a tree. Every state has one of the four movements assigned, but there is nothing that restricts some subset of the states to be connected by a cycle of links. This will happen if the sum of the costs around some cycle is negative. In this case the solution does not converge. The values of the states on the cycle are unbounded from below. If there are no cycles with negative total cost, the state values will converge and the optimum policy will describe a tree.

 
  
Return to Top

tree roots

Operations Research Models and Methods
Internet
by Paul A. Jensen
Copyright 2004 - All rights reserved