Computation - Operations Research Models and Methods

Computation

Dynamic Programming

DP Examples

Baseball Model

Baseball Solution

Excel Page
Download

	Dynamic Programming Examples
	- Baseball Solution

The solution of a DP model is found with the DP Solver add-in (dp_solver.xla). This program can be run independently, but the easiest way to use it is to construct a model with the DP Models add-in and call the DP Solver from the model page. Click the Transfer to DP Solver at the top of the model page.

Three kinds of DP Solver models are available. The first, Transition List, presents the data much like the lists constructed by the DP Models add-in. All transition information is described by lists. The number of transitions determines the number of rows in the Solver model. Excel allows very many worksheet rows.

The second option, Probability/Cost Matrix, shows the transition probabilities and costs as matrices. The dimensions of the two matrices are (number of decisions)x(number of states). For small problems this is the most visual representation. For larger problems, the problem size is limited by the number Excel columns. With older versions of Excel the number of columns restricts the use of this representation to about 100 states. Excel 2007 allows many more worksheet columns than Excel 2003, thus increasing the problem size. Viewing the transition matrix is not very useful for large problems because the nonzero transitions are usually a very small proportion of the total number of matrix entries.

The third format, Probability/No Cost Matrix, shows the transition probabilities as a matrix, but computes the expected transition cost and shows that result in a single column. Models of this form are limited to about 200 states.

For this example we use the Transition List option. The same DP model can provide the data for several Solver worksheets by entering different names in the name field.

The DP Solver model has much the same appearance as the lists created by the DP Models add-in. The primary differences are columns created for computation of solutions. Data is simply copied from the lists rather than linked through formulas. If the model changes the DP Solver form must be rebuilt. The user need not interact with DP Solver form. All equations are automatically inserted and solutions are obtained by clicking Solve button at the top of the worksheet.

Parameters

The DP Solver model is placed on a new worksheet called BB_DP. Parameters are placed at the upper left corner of the worksheets. Yellow cells hold parameters that should not be changed. The green cells hold information about the current DP solution technique employed. Important to this discussion are the two cells labeled Maximum Value Difference and Sum Probability Difference. The values in these cells should converge to zero as the algorithm progresses.

States

Columns D through J hold data for the states of the model. The yellow cells in the figure below hold equations provided by the add-in. Columns that participate in the solution process are the State Value (Column Q) and the Next Value (column R). The goal of this optimization is to find the action for each state that maximizes the State Value. In this case the state value is the number of runs produced during an inning of play. An iterative process finds the optimum action for each state. They are represented in the table below by the Action Index column and the Action Name column. The iterative dynamic program terminates when the State Value and Next Value are close enough. This is indicated by the Maximum Value Difference that is computed as the maximum absolute value difference between the State Value and Next Value. The iterative process also determines the state probabilities for the optimum policy. These are computed in columns S and T. They aren't very interesting for this example since the process always terminates in the final state.

The values shown in columns Q through T are the initial values. We will see that these values converge to the optimum values after the application of dynamic programming.

Actions and Events

The actions and event definitions are transported to the DP Solver model as shown on the left. The columns labeled Runs and Probabilities are irrelevant for this example. The number of runs and the probabilities of transition are not a function of either the action or event alone. Run production depends in a complex way on the state, action and event. The probability of a transition depends on the action and event. These are all defined in the transition blocks discussed below.

Decisions

The first six columns of the decision list are copied from the model. Columns AM through AP compute values necessary for optimization. Only 34 of the decisions are shown. There are 78 decisions for this problem, each characterized by a state and an action.

Transitions

There are 402 transitions described for this problem. A few are shown on the transition list below. Most of the data is copied directly from the model lists. The final two columns are important for computing the results of the DP.

Optimization

We solve the DP by clicking on the Solve button at the top of the page. There are several solution options. We choose to solve the DP with the Policy method. This finds the optimum policy in one step. The results are shown below.

The optimum action is found for each state. It happens for this data that the optimum solution is the Hit action, no matter what the state. Although this seems like an uninteresting policy, it is one that couldn't be determined without this kind of analysis.

An interesting result is the State Value for the initial state (0,0,0,0), no outs and no one on base. The value is 0.8122. That is the expected run production in the inning when the optimum policy is followed.

Alternative Data

To illustrate the flexibility of the DP Models add-in with different data, we change the data table to represent a player with less power but more speed. The new data is shown at the left. The new solution is shown below.

The solution describes a policy that involves more base stealing. The expected runs in the inning with this policy is reduced to 0.2916 runs per inning. The faster team has a different strategy but with less scoring.

Modifications

The model involved a single inning with one kind of player. The rules of baseball were followed, but not all the events were accurately modeled. The biggest assumption is that all batters and runners have equal skills. It would be interesting to describe a batting order for the inning so that each player had a different action/event probability matrix. Assuming a nine place batting order, the model would be at least nine times the size of this model, but it would fit the Excel worksheet. No attempt has been made to model or solve the larger problem.