Modeling Tip: Sum Random Variables

An easy to make mistake when modeling identically distributed random variables is to use a single variable to model several identically distributed values.  We could easily assume that several random variables that have the same probability distribution could be modeled by sampling a single variable and multiplying by the total number of variables.  This is a major problem and we'll show how this can result in grossly inaccurate simulation results.

If there are identically distributed random variables, it's not the same thing to treat five random variables separately as it is to multiply one random variable times 5.  As we'll see, both mathematically, and through simulation, this can result in a major difference in variance.

Mathematics of Random Variables

Before we run simulations, let's see the difference with random variable algebra.  The variance of a sum of random variables is:

Var[X + Y] = Var[X] + Var [Y]

The variance of a random variable multiplied by a constant, k is:

Var[kX] = (k^2)Var[X]

 

Consider an example where we have five random variables that we are summing.  Each random variable has a variance of 4.  The variance of the sum is:

Var[sum] = 4 + 4 + 4 + 4 + 4 = 20

Now, instead of summing each random variable, we'll just multiply one random variable by 5.

Var[5X] = (5^2)(4) = 100

Note that the variance of the sum is 5 times less than multiplying a single random variable by 5!  This is where we can get into trouble.  Each random variable should be treated as its own entity, and not multiplied.

A Simulation Example

Let's see how this drastic difference in variance affects a simulation.  We have an assembly where five identical parts are assembled on top of each other.  We want to simulate the stack height assuming that each part's thickness is a uniformly distributed random variable.

We could model this two ways:

  1. Model each part's thickness as a random variable and sum the thicknesses.
  2. Model one random variable for thickness and multiply by five to get the stack height.

A model for five random variables is shown below.  Each part has a nominal thickness of 2.  The minimum thickness is 1.99 and maximum thickness is 2.01.

If we simulate the stack height for 50,000 iterations, we get the following histogram.

In the model that follows, we have one random variable for part thickness in cell B2.  To get stack height, we multiply the random variable by five.

If we simulate this model for 50,000 iterations, we get a very different histogram.

We can further compare the simulations with a box and whisker plot.  This also shows that we have very different results.

Variance of Each Simulation

The variance of each simulation was as follows:

  • Five variables: 0.00017
  • One variable: 0.00083

The variance for a uniformly distributed random variable is: (1/12)(b-a)^2

The variance for each part's thickness is: (1/12)(2.01-1.99)^2 = 0.0000333

For the five variables model, variance is the sum of 0.0000333 for each variable, which is 0.00017.

For the one variable model, variance is (5^2)(0.0000333) = 0.00083.

The simulations agree with random variable algebra.

Conclusion

It may be tempting to lump identically distributed random variables and simply multiply by the number of variables.  But as we've shown here, this results in drastically different outcomes.


Simulations were performed using Simulation Master.


 

Excel is a registered trademark of Microsoft Corporation. Used with permission from Microsoft.