Correlation Matrix Definiteness

When simulating a model, some random variable inputs may be correlated.  One way to introduce this correlation into the simulation is by rank order correlation.  In this method a correlation matrix is used to define the correlation structure.  The correlation matrix definiteness is important.

A correlation matrix must be either positive definite (PD) or positive semi-definite (PSD).  In this article we'll discuss definiteness as well as trying to conceptually understand why it's important.

Determining if a Matrix is PD or PSD

There are several ways to determine if a matrix is PD or PSD.  We will focus on one of the easier ways to do it, especially if you are using a spreadsheet.

One way to determine if a matrix is PD or PSD is to calculate the n upper left determinants of an n x n matrix.  If all upper left determinants are > 0, then the matrix is PD.  If all upper left determinants are ≥ 0, then the matrix is PSD.

Below is a 3 x 3 correlation matrix for random variables located in cells A1, A2, and A3.

The upper left determinants are shown below.

Since the three upper left determinants are all > 0, the matrix is PD.

If you're using a spreadsheet, Excel has a function to calculate determinants.  Therefore, this method can be used to check correlation matrix definiteness.

Simulation Master also has a tool for checking a correlation matrix and identifying variables with correlation problems.  Check out this page, for a tutorial on using the tool.

Special Case of Two Random Variables

Let's consider the trivial case of two random variables which results in a 2 x 2 correlation matrix.  There is only one correlation between the two variables and correlation is limited to ≥ -1 and ≤ 1.  If we define the correlation between the variables as a, we get the following matrix.

The two upper left determinants are:

Since |a| ≤ 1, the upper left determinants of a 2 x 2 correlation matrix is always ≥ 0 and therefore always either PD or PSD.  When correlation is -1 or 1, the matrix is PSD.  Otherwise, the matrix is PD.

Practical Meaning of PD and PSD Requirement

In practical terms, requiring a matrix to be PD or PSD is akin to not allowing correlations to be inconsistent.  However, being PD or PSD does not guarantee that the correlations are correct!

Consider the case of three random variables, X, Y, and Z.

The correlation between the variables is:

XY = .85

XZ = .9

YZ = .1

The correlation matrix is not PD or PSD.  In words, this means that since X and Y are highly positively correlated and X and Z are highly positively correlated, then the correlation between Y and Z must be more than .1.

Another way to think of it is when X is high, Z is likely to be high as well.  Since the XY correlation is .85, Y is likely to be high as well.  If Y and Z are likely to be high, their correlation should be greater than .1.

The discussion above assumes that the correlation between Y and Z is the problem.  Maybe it's not, and one of the other correlations is wrong.

Conclusion

Having a correlation matrix that is PD or PSD doesn't mean that your model is accurate.  That is an entirely different discussion.  But a matrix that is not PD or PSD is a red flag that something is inconsistent.  Check the data or assumptions that the correlations are based on.


Excel is a registered trademark of Microsoft Corporation.  Used with permission from Microsoft.