Input Correlation Methods in Simulation Master

In this article, we will discuss input correlation methods that can be used with Simulation Master.  In a previous article, we discussed why it is important to account for correlation among input random variables when performing Monte Carlo simulation.

There are two input correlation methods available in Simulation Master:

  • Rank order correlation
  • Bivariate copulas

Rank Order Correlation

Rank order correlation makes use of the Spearman rank correlation between variables to set up values for each simulation trial that mimic this correlation.  We will use a very simple example to show how this is done.  Consider a model where we are modeling profit with demand and cost as input random variables.  A correlation matrix is located in the cell range E1:G3 and the Spearman correlation coefficient is entered in the matrix.  Initially, we will use a value of .95.

Rank Order Correlation Model

After simulating the model for 25,000 trials, we can plot the values of cost vs. demand for each trial.

Rho = .95

In the previous simulation, Spearman rho was .95.  If we change rho to .5 and simulate again we get the plot below.  Note how there is more dispersion than the .95 plot, as we would expect since correlation is lower.

Rho = .5

With rank order correlation, dispersion is maximum when each variable is near the midpoint of its range.  The extremes are pinched slightly.

Bivariate Copulas

Bivariate copulas can model a correlation structure between two random variables.  The copula is a function that "couples" how one variable is correlated with the second variable.  Copulas are more difficult to intuitively understand than rank order correlation.  The advantage of copulas is we can use different copulas to change the general shape of the correlation structure, especially at the tails.

Simulation Master has four copula types:

  • Clayton
  • Frank
  • Gumbel
  • Farlie-Gumbel-Morgenstern (FGM)

A plot of each copula type is shown below.  Note how the correlation structure for each copula is very different.  These types of copulas have an alpha parameter that governs the degree of correlation.  Copulas also have a directional component depending on the correlation.  This means the plots can be rotated.  Copula direction is beyond the scope of this article, but you can learn more about copula direction here.

 

Clayton Copula
Frank Copula
Gumbel Copula
FGM Copula

Let's revisit the example model from earlier.  This time we will use a Clayton copula to model the correlation between demand and cost.  The copula is defined by the four cells in E2:F3.  F2 and F3 are random number inputs to the copula.  Cells E2 and E3 are the so-called "output" of the copula.  These outputs are fed to the demand and cost random variables as their random number inputs for sampling.  The copula outputs are correlated according to the copula's correlation structure, and therefore the demand and cost are correlated in the same manner.

Copula Model

A plot of cost vs. demand is shown below.  Note that the correlation varies depending on the values of each variable.  When both are at their low values, they are highly correlated.  When both are at the high end, they are less correlated.

Variables Using Clayton Copula

Summary of Input Correlation Methods

So we've covered the input correlation methods in Simulation Master.  Which one is best?  The answer, of course, is it depends.

Rank order correlation is simple to understand and all we have to do is determine the Spearman rank correlation among input variables.  This requires data, but the bigger concern is what if the correlation changes?  To use this method, we have to assume correlation is constant for all conditions.

Copulas are harder to understand, but allow more flexibility for changing correlation.  Selection of the correct copula and its alpha parameter without supporting data can be difficult.  If historical data is available, Simulation Master premium edition has a copula fitting tool that can fit a copula to data.  Things to consider are if you have enough representative data, and if potential outliers are accounted for in the data.