Computation - Operations Research Models and Methods

Computation

DP Collection

DP Data

Resources

Data

Model

Excel Page
Download

	Dynamic Programming Data
	- Resources: Deterministic Dynamic Programming

	The DDP model of the resource problem has only states and actions. No events are required because there is no uncertainty in this model. We use the two resource model as an illustration. The data for this model is shown below.

States and Action
	This model uses the Stage feature. We need two state variables to represent the two resources. The stage indicates the item number. We define stages because each item has different parameters. When the stage feature is used, the first state variable is devoted to the stage number. In the example stage variable starts at 1 and goes to 5. The first four values represent the four items of the problem and the fifth is used for terminal states. The second stage variable indicates the amount of the first resource used. The third state variable indicates the amount of the second resource used. The initial state, S(1,0,0) indicates that the action taken is to choose the number of item 1 to include when no resources are currently used. Maximum values of the states are passed from the data worksheet. When the data is changed, the upper bounds on the state variables are also changed.

	The action is to choose the number of items of the type specified by the current state. The number ranges from 0 to the maximum number of items given in the third row of the data form (Max. Number of Each Item). The maximum values for the individual items is limited in the Decision blocks discussed later on this page. The action is evaluated in cell N21. The number in this cell is drawn from the revenue data table using the Excel Index function.
State Subsets
	State subsets are placed immediately below the state and action definitions. The first subset describes the exit states. Note that the lower and upper bounds in cells F39 and F40 are both set to the maximum of the first state variable. States with the item number of 5 are terminal states and transitions are created to the final state through the TRUE value in R39. The revenue in R38 is computed as the value of the resources left over. The resources left over are computed for convenience in row 41. Of course, the initial state S(1,0,0) does not satisfy the condition of this state subset, so the entry in R37 is computed as FALSE. The second state subset includes all states. The second subset includes all the terminal states, but the first subset is chosen over the second when both are feasible.

	The enumeration process finds all combinations of three state variables. The number of combinations is equal to the product of the ranges of the three variables, 53126 = 4030. The last state is the final state for a total of 4031.
	.
	The action list has three alternatives. The cost column is irrelevant to the model, since action costs depend on the combination of state and action.

Decisions
	Decisions describe combinations of states and actions. The boundaries on the decision block for the first state variable eliminate terminal states (state 1 equal to 5). These are defined as exit states in the state subset blocks and there is no need to repeat them here. The decision maximum in M58 is passed from the item limit on the data worksheet. Individual items may have smaller limits than the overall maximum. The revenue cell holds the value computed on the revenue table on the data worksheet. Thus it reflects both the item state variable and the action.

	There are 6895 feasible decisions for this example.
	.
Call Solver
	Clicking the Transfer to DP Solver button on the top of the model worksheet calls the DP Solver add-in to construct the solver model and transfers the list information to the form. Click the Solve button to perform the DP algorithm. Only Value iterations are available for the DDP model. The upper left corner of the worksheet is shown below after the optimal solution has been found. Seven value iterations are required.

	Click the Recover Optimum button and choose the initial state. The default value is state 1. The optimal states are shown in the next figure. The list is below the state definition on the worksheet. The system starts in state S(1,0,0). The optimum action is choose none of item 1 and the state moves to S(2,0,0). The optimum action in that state is to choose two of item 2. This moves the system to state S(3,8,8). The 0 action at state S(3,8,8) leads to S(4,8,8). The 0 action at state S(4,8,8) leads to S(5,8,8). This is an exit state so finally the system moves to the final state and the optimum sequence is complete.


Summary
	The problem of selecting alternative activities that use limited resources is a classic problem of operations research. The dynamic programming solution approach suffers from the curse of dimensionality. The number of states and decisions grows rapidly with the size of the problem. DP handles problems with separable nonlinear objective function and resource requirement functions with little variation of the models described here. Also, problems with uncertainties regarding resource usage can be addressed with the addition of events to the model. The resulting model is a MDP model. Deterministic resource allocation problems are more readily handled by mathematical programming. Problems with many activities and resources can be easily addressed. Separable nonlinear terms in a math programming model are more difficult to formulate but well within the capacities of modern computational systems. It more difficult to add the uncertainties to the math programming model, and DP may well have an edge for this kind of problem.