This tutorial will shown how to sample mixture distributions using Simulation Master. A mixture distribution is created from a collection of random variables using weights that correspond to the probability of each random variable contributing to the mixture.
For example, a combined probability density function comprised of two distributions could be written as:
f(x) = w1*f1(x) + w2*f2(x)
w1, w2 are weights that must sum to 1
f1, f2 are probability density function of the distributions
In this tutorial, we will create a mixture distribution using three normal distributions. The distributions will have means of 5, 12, 20 and each will have a standard deviation of 2. The distributions have weights of 0.4, 0.3 and 0.3 respectively.
To set up a mixture distribution in the model, we will make use of the RVUSERDIST function to record the sampled values in the simulation data sheet. The mixture distribution model is shown below.
The same spreadsheet is below with the formulas shown. In cells B6 through D6 are the distribution sampling functions for normal distributions. Cell B8 contains the selector variable which is generated using a uniform distribution on (0, 1). The selector variable determines which of the three distributions will be sampled from for a given iteration.
The mixture random variable is calculated in cell B9. A nest IF statement is used to select the distribution. Note that the formula in cell B9 is "wrapped" with the RVUSERDIST function. This is used to record the value of B9 in the simulation data sheet.
It should also be noted that cells B6 through D6 and cell B8 have "1*" in front of the RV functions. When the simulation is run the sheet is scanned and any formula that starts with "RV" is recorded in the simulation data sheet. Multiplying by 1 at the beginning of the formula avoids recording these cells. If you want to record the values in these cells, remove the "1*".
We can verify the shape of the mixture distribution by simulating the distribution with cell B9 as the simulation output. This will create a histogram of the distribution's values. The results of the simulation is shown below. We see three peaks that correspond to the mean of each constituent distribution. The peak at 5 is highest because distribution 1 (mean 5) has the highest weight factor.