When understood properly, Monte Carlo simulation is an invaluable tool for understanding uncertainty, and ultimately the risks that your project faces. In this article we will discuss the basics of Monte Carlo simulation.
Risk is sometimes a catch-all term that isn't well-defined. Monte Carlo simulation can find risks such as the following:
- What is the probability of the project being late?
- What are possible returns of the portfolio?
- What is the probability that the project has a negative net present value?
- What is the probability that the design does not meet requirements?
Monte Carlo simulation isn't only a risk analysis tool. It can also be used for estimating integrals that are difficult to solve analytically, as well as optimizing a solution that contains random variable inputs.
To see some applications of Monte Carlo simulation, be sure to check out the applications page.
Definitions
Before we get into the basics of Monte Carlo simulation, we will start with some definitions.
Model A mathematical representation of a system. Inputs to the model result in an output value. When using Monte Carlo simulation, a model may be developed in a spreadsheet or coded into a computer program.
Input The values that are used to calculate an output value. Inputs are discussed in the next section.
Output The value we are calculating with the model. Examples include net present value, mechanical tolerances, project critical path completion time, etc.
Model Inputs
A model depicts a system's response (output) to a set of input values. Inputs can be one of the following types:
- Fixed. The value is constant at all times, such as tax rate.
- Controlled. We have control over the values that the input can assume. This is also known as a decision variable.
- Variable. We don't know the exact value of the input, but we assume that the possible values follow some probability distribution. This is also known as a random variable.
Assigning Distributions to Random Variables
So if we know that an input is a random variable, how to we go about assigning a probability distribution? There are several ways of doing this, and only one method may be available depending on the situation.
A distribution fitting tool could be used to find the most appropriate distribution and its parameters if there is historical data on the random variable's behavior.
Another method would be to use industry standard practices to assume a distribution, and then some method to estimate the distribution parameters. For example, the life time of mechanical components is often modeled using the Weibull distribution. Estimation of the parameters can be approximated from database values of components.
If no data or standards are available, consulting with experts may be the best method. Distributions such as the triangular and PERT require "three-point" estimates of minimum, most likely, and maximum values for a random variable. The uniform distribution uses a minimum and maximum estimate and assumes all values in between have an equal chance of occurring. The trapezoidal distribution uses a four-point estimate of minimum, maximum, and two modes. This provides for the greatest probability between the two modes. The uniform, triangular, and trapezoidal don't normally occur in real life, but are used to approximate uncertainty while allowing for the use of expert estimates.
Correlation Between Random Variables
Often, random variables are assumed to be independent of each other. That is, the value of one random variable is not correlated with the value of another random variable. If random variables are correlated, this should be accounted for in the simulation. Risk can be under or over estimated if independence is assumed when there is correlation present.
We will revisit correlation later on.
An Example Model to Show the Basics of Monte Carlo Simulation
To demonstrate the basics of Monte Carlo simulation, we will use an example. The following spreadsheet models project net present value. The green shaded cells are random variables that are modeled with the triangular distribution. The yellow cell is the model output.
What Happens During a Simulation?
Let's assume we have created a model, set values for fixed and controlled inputs, and defined the probability distributions for random variables.
Because there are random variables present, each time the model is calculated, we get a different output. Why is this? Well, in Monte Carlo simulation, each random variable is "sampled" each time the model is calculated.
Random Variable Sampling
What do we mean by sampling? This is basically drawing a random number, applying it to a random variable's probability distribution, and getting a value for the random variable. This is repeated for each random variable. The sampled values are plugged into the model and the output is calculated.
This procedure is shown in the chart below. We draw a random number between 0 and 1. We plug the random number into the cumulative probability distribution, and solve for the value of the random variable.
One calculation (iteration) of the model by itself is not very important. It's just one possible outcome. In the real world this is no better than using single point estimates for each input. Where simulation excels, is that we repeat this procedure for thousands of iterations.
By repeatedly sampling random variables, we eventually get a spectrum of model outputs. This range of outcomes illustrates the potential risk that is present.
Correlated Random Variables
Earlier we talked about correlated random variables. To account for correlation, software generally draws values of the random variables for all iterations. The order of the random variables are rearranged to be similar to the correlation structure.
As an example, let's say there are two random variables with a positive correlation. When one variable has a high value, the other variable tends to have a high value as well and vice versa.
If our simulation will do 20,000 iterations, the software will draw 20,000 values for each random variable. Each time the model is calculated, a pair of values, one for each random variable, is plugged into the model. Before doing this, the 20,000 pairs of values are rearranged to match the correlation structure. If one variable is high, a high value of the second is paired with it. If the first variable is low, a low value of the second is paired with it, and so on.
By simulating in this manner, we mimic real world correlation in the model output.
Summary of the Simulation Process
In general, the simulation process uses the following algorithm.
For i = 1 to iterations Sample each random variable Next i If correlated then Rearrange random variables End if For i = 1 to iterations Plug random variables into model and calculate output Next i
Simulation Results
The results of a simulation are usually presented in the form of a histogram showing output value and frequency of occurrence. Summary statistics of the output values are also calculated. Below is the output from simulating the model shown earlier.
What to Do with the Results?
So we've simulated the model. What can we do with the information? In our example, the model output is project net present value (NPV). We can find out the probability that NPV is greater than 0 by counting the percentage of outcomes greater than 0. As shown below, the project has a 91.36% probability of NPV greater than 0.
In addition, we have the range of outcomes from the minimum and maximum values and other statistics from which to draw conclusions about the results.
The shape of the histogram can also be useful. In our example, the histogram is fairly symmetrical. If the histogram were negatively skewed, the mean may look OK, but the long negative tail may indicate the potential for a large loss.
So That's the Basics of Monte Carlo Simulation
We've covered the basics of Monte Carlo simulation, now it's time to start using it in your applications. When thinking about any models you create, are there any assumptions to the input values that could be better modeled as a random variable? If so, Monte Carlo simulation may be a good tool to use, assuming you can specify the distributions for each random variable.
Excel is a registered trademark of Microsoft Corporation. Used with permission from Microsoft.