Models - Operations Research Models and Methods

Models

Dynamic Programming

Investment Problem

Problem Statement

DP Approach

DP Finish

	Investment Problem
	DP Approach

To begin, we ask the manager to decide on the amount to invest sequentially. That is, we first ask her to consider opportunity 1, then opportunity 2, and so on. This is difficult to do in a way that maximizes the total return. The manager notes that the more she invests in opportunity 1, the greater the annual return. Consequently, she might feel that a greedy approach is called for -- one that invests in the highest possible level, 4 in this case. It should be apparent, though, that such an approach may yield very poor results. Committing a large portion of the budget to early opportunities precludes potentially more attractive returns later on. Instead, we ask the manager to solve the problem backwards, starting with opportunity 8, then 7 and so on until she makes a decision for opportunity 1. With a little organization, we find that this procedure is possible.

First we ask, how many units should we invest in opportunity 8? The manager responds that she cannot make that decision unless she knows how much of the budget has already been spent for the first 7 opportunities. Then, the decision is obvious. For example, if all the budget is spent by the previous opportunities, the investment in 8 must be zero. If one unit of budget remains she will invest 1 in opportunity 8 if the return exceeds the 0.5, the value for leftover funds. In general, if x units are already spent, she can invest up to10 – x in opportunity 8. Of course when x is 5 or less, 4 units can be invested and there will still be money left over.

Figure 1.
States for s₁ = 9

To formalize the dynamic programming approach, we define states and decisions. A state is described by the opportunity index, , and the amount already spent, . The state variables are contained in the vector .

We use = 9 to mean that there are no more opportunities. For this problem, we call any state that has equal to 9 a final state because there are no more decisions in our sequential process. The set of final states is F. For the investment problem

A final state has a value defined by the final value function, f(s). For this problem, the final value is the annual return of the funds not spent.

Graphically, we represent a state as a node as in Fig. 1 where only the final states are shown. The final state values are in brackets next to the nodes.

Now we address the question of finding the optimal decision for opportunity 8. In general, a decision is identified by the decision variable, d, the amount to invest.

Let be the number of units selected for opportunity 8. In state (8, 10) no budget remains so the optimal value of must be 0. In state (8, 9), the choice is between investing 0 or 1. For = 0, a unit of budget will remain for a final return of 0.5. For = 1, the return is 1. Clearly better choice is is 1. The details of the decision process for four states are given in Table 2.

Table 2. Optimal decision process for opportunity 8
State	Decision	Decision objective	Next state	Next state value	Total return	Optimal decision	Optimal return
s	d		s'	f(s')	r(s, d)+ f(s')		f(s)
(8, 10)	0	0	(9, 10)	0	0	0	0
(8, 9)	0	0	(9, 9)	0.5	0.5
	1	1.0	(9, 10)	0	1.0	1	1.0
(8, 8)	0	0	(9, 8)	1.0	1.0
	1	1.0	(9, 9)	0.5	1.5
	2	1.7	(9, 10)	0	1.7	2	1.7
(8, 7)	0	0	(9, 7)	1.5	1.5
	1	1.0	(9, 8)	1.0	2.0
	2	1.7	(9, 9)	0.5	2.2
	3	2.3	(9, 10)	0	2.3	3	2.3
(8, 6)	0	0	(9. 6)	2.0	2.0
	1	1.0	(9, 7)	1.5	2.5
	2	1.7	(9, 8)	1.0	2.7
	3	2.3	(9, 9)	0.5	2.8
	4	2.8	(9, 10)	0	2.8	3 or 4	2.8

The computations for a particular state are shown by the colored bands in the table. We see that a separate optimization is carried out for each state s. The column labeled d shows all feasible values of the decision variable. A value not in this list would use more than the budget available. The decision objective is the annual return for selecting the amount d. This value comes from Table 1. The column labeled s' is the next state reached by making decision d while in state s. The next state is given by the transition function T(s, d). For this problem the transition function is

The value of the next state, f(s'), has already been computed and is shown in the next column. The total return is the sum of the decision return and the next state value and is the quantity to be maximized.

For each value of s, we compare the total returns for the different values of d and choose the one that gives the maximum. This is a simple one-dimensional problem solved by enumerating the discrete alternatives. The optimal decision is , where the argument s indicates that the decision is a function of s. Finally the optimal return is the value f(s).

The computations are performed by solving the following backward recursive equation.

It is a recursive equation because the function f(s) appears on both sides of the equality sign. We can solve it, only if we proceed in a backward direction through the states.

Figure 2.
States for = 8 with optimal decisions

The optimal decisions for opportunity 8 are shown in Fig. 2 as the red lines between the states. When a tie occurs it can be broken arbitrarily. State (8, 6), for example, admits two optimal decisions 3 and 4. We have chosen 3 for the illustration. The values for the states are shown adjacent to the nodes.