Return to Index
Operations Research Models and Methods
 
Computation Section
Random Variables

Continuous Distributions

  A continuous random variable is one that is measured on a continuous scale. Examples are measurements of time, distance and other phenomena that can be determined with arbitrary accuracy. This section reviews the general concepts of probability density functions and presents a variety of named distributions often used to model continuous random variables. The figure below shows an example called the triangular distribution.

 

The probability density function (pdf) is a function, f(x), which defines the probability density for each value of a continuous random variable. Integrating the probability density function between any two values gives the probability that the random variable falls in the range of integration.

The definitions at the left assumes that values are within the range a to b, but the lower limit may be negative infinity and the upper limit may be infinity. Values within the range may have zero probability density, but the density must be nonnegative.

When we integrate the density function we use y as the variable of integration.

 

The cumulative distribution function (CDF) describes the probability that the random variable is less than x. The probability that the random variable falls within a specific range is found by subtracting the cumulative distribution evaluated at the lower limit of the range from the cumulative distribution evaluated at the upper limit.

 

 


The example for this introductory section is the triangular distribution illustrated in at the top of the page. The functional form for this density has a single parameter c that determines the location of the highest point, or mode, of the function. The random variable ranges from 0 to 1. No values of the random variable can be observed outside this range where the density function has the value 0.

Integrating the density function for the triangular distribution results in the CDF also shown in figure. The example illustrates the characteristics of every CDF. It is zero for the values below the lower limit of the range. Within the range the function increases to the value of 1. It remains at 1 for all values greater than the upper limit of the range. The CDF never decreases and remains constant only when the pdf is zero.

 
The Random Variables add-in defines distributions using named ranges on the worksheet. For the example, the range B2:B5 has the name RV13. The Mean and Variance in B6 and B7 are computed with user-defined functions provided by the add-in.

 

Moments

 

Several quantities can be computed from the pdf that describe simple characteristics of the distribution. These are called moments. The most common is the mean, the first moment about the origin, and the variance, the second moment about the mean. The mean is a measure of the centrality of the distribution and the variance is a measure of the spread of the distribution about the mean.

The skewness is computed from the third moment about the mean. This quantity can be positive or negative. We normalize the measure by squaring the third moment and dividing it by the third power of the variance. To recover the sign of the third moment, we multiply this ratio by the sign of the third moment. The skewness indicates whether the distribution has a long tail to the right of the mean (positive) or to the left (negative). The skewness is 0 for a symmetric distribution.

The kurtosis is a measure of the thickness of the tails of the distribution. The use of this measure is not obvious in most cases, but it is included for completeness. The formula for this measure subtracts 3 from the ratio of the fourth moment about the mean and the square of the variance. The Normal distribution has a kurtosis of 3, so this normalization provides a value relative to the value for the Normal distribution. It can be positive (greater than the Normal) or negative (less than the Normal).

 


The general moments for the triangular distribution are derived by integration. We have functions for the mean and variance. The formulas for skewness and kurtosis will be more complicated and we have not derived them. With c less than 0.5, as for the example, the skewness is positive.
 

The first and second moments for the example are computed with user-defined functions functions provided by the add-in. In the absence of formulas for skewness and kurtosis the add-in cannot provide them.

When simple formulas for moments of discrete distributions are not available, the moments can be found by summing the terms in the general definition. The moments of continuous functions can be found by numerical integration, but the Random Variables add-in does not have this capability. The Functions add-in computes the four moments for arbitrary density functions using numerical integration.

 

Named Distributions

 

Models involving random occurrences require the specification of a probability distribution for each random variable. To aid in the selection, a number of named distributions have been identified. We consider several in this section that are particularly useful for modeling phenomena that arise in operations research studies.

Logical considerations may suggest appropriate choices for a distribution. Obviously, a time variable cannot be negative, and perhaps upper and lower bounds due to physical limitations may be identified. All of the distributions described below are based on logical assumptions. If one abstracts the system under study to obey the same assumptions, the appropriate distribution is apparent. For example, the queuing analyst determines that the customers of a telephone support line are independently calling on the system. This is exactly the assumption that leads to the exponential distribution for time between arrivals. In another case, a variable is determined to be the sum of independent random variables with exponential distributions. This is the assumption that leads to the Gamma distribution. If the number of variables in the sum is moderately large, the Normal distribution may be appropriate.

In some cases, a particular distribution may chosen because it best fits the form of observed data. There are statistical tests to determine the parameters for the best fit.

Very often, it is not necessary to determine the exact distribution for a study. Solutions may not be sensitive to distribution form as long as the mean and variance are approximately correct. The important requirement is to represent explicitly the variability inherent in the situation.

In every case, the named distribution is specified by the mathematical statement of the probability density function. Each has one or more parameters that determine the shape and location of the distribution. Cumulative distributions may be expressed as mathematical functions; or, for cases when integration is impossible, extensive tables are available for evaluation of the CDF. For some distributions, formulas for the moments have been derived.

It is convenient to use named distributions because it is easier to statistically estimate parameters rather than entire nonparametric distributions. Also, named distributions can more easily be simulated for simulation studies and probabilities can be computed from the formulas available for named distributions. We consider a number of named distributions on the following pages. Click on a link at the left for descriptions and examples.

 


  
Return to Top

tree roots

Operations Research Models and Methods
Internet
by Paul A. Jensen
Copyright 2004 - All rights reserved

Next Page