When fitting a probability distribution to a set of data, we would like to know how well our distribution fitting has done to represent the underlying data. We will dive into the nuts and bolts of assessing distribution fit.
There are several methods used for assessing distribution fit. We will divide these methods into two general categories: formal and informal methods which will be discussed later. The key takeaway is that we should use all available information to determine whether a distribution is a good fit and not rely on a single method.
Formal Methods of Assessing Distribution Fit
Formal methods use some form of calculated value in assessing distribution fit. Some are relative fit measures and others are absolute goodness of fit. Relative fit measures only tells whether one distribution is better than another, but doesn't say how well it fits. If we fit two distributions and both are bad, the relative value only says that one is better than the other, but the better one may be a poor fit itself.
Relative Fit Measures
Information criterion can be used to rank multiple distributions in a relative way. They do not give an indication that the fitted distribution is a good fit, only a way to rank several candidate distributions. Simulation Master uses three information criterion values to rank distribution fits.
- Bayesian Information Criterion (BIC), aka Schwarz Information Criterion
- Akaike Information Criterion (AIC)
- Hannan-Quinn Information Criterion (HQIC)
Simulation Master uses maximum liklihood estimation to fit distribution parameters, and information criterion works well since log-likelihood is the main part of calculating information criterion. Information criterion also penalize distributions with more parameters to avoid overfitting.
Goodness of Fit
There are many goodness of fit measures, and no one measure is without limitations. This means no goodness of fit statistic can be used with absolute certainty. For continuous distributions, Simulation Master calculates three goodness of fit statistics: Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared. For discrete distributions, Simulation Master calculates Chi-Squared.
Informal Methods of Assessing Distribution Fit
Informal methods are more subjective than formal methods, and often entail qualitative methods of assessing distribution fit.
Charts
Histogram with Overlay
This chart displays a histogram of the data with a curve of the fitted distribution. The histogram bin heights are scaled so that their areas sum to 1. Thus the histogram approximates a probability density function. The curve overlay is the probability density function of the fitted distribution.
Cumulative Distribution
This chart plots the empirical cumulative distribution of the data along with the cumulative distribution of the fitted distribution. We can directly compare any deviations between the two.
Q-Q Plot
The Q-Q plot displays quantiles of the distribution versus quantiles of the data. A good fit appears as nearly a 45 degree line.
P-P Plot
The P-P plot displays probability of the distribution versus probability of the data (empirical distribution). A good fit appears as nearly a 45 degree line.
Comparing Data Statistics to Distribution Statistics
This method simply compares statistics of the data to the theoretical statistics of the fitted distribution. A good fit should show agreement between the statistics.
Examples of Good and Poor Fits
To illustrate a good fit versus a poor fit we'll generate data that is sampled from a normal distribution. Then we fit a normal and extreme value minimum distribution to the data for comparison. Since we know the data is normally distributed, the normal distribution will be a good fit.
The fit results for each distribution is shown below.
First of all, the normal distribution is ranked as the best fit of the two distributions. All of its information criteria are lower than extreme value minimum. As a second check, look at the goodness of fit statistics. All GOF statistics for normal are lower than extreme value minimum which indicates a better fit. The most obvious indicator is the histogram with overlay. The normal distribution fits over the histogram nicely, while the extreme value minimum doesn't fit the tails or the center of the histogram. Finally, if we compare the data's statistics to each distribution, the normal is much closer to the data.
While the example was intentionally chosen to show clear differences between the two distributions, real life data sometimes doesn't cooperate. This is especially true for small data sets. There can be situations where the histogram with overlay doesn't give a clear indication of a good fit. One goodness of fit statistic will be better for one distribution and another goodness of fit statistic will be better for another distribution. Likewise, comparing statistics may not give a clear cut answer. Again, the totality of information must be used to decide if a distribution is a good fit.
Let's look at other tools we can use to assess fit. The cumulative chart, Q-Q plot, and P-P plot are generated in the fit report. We'll look at each.
Cumulative Charts
The normal cumulative chart shows good agreement between the normal distribution and the data. Their cumulative curves are almost on top of each other. The extreme value minimum chart shows differences between the distribution and the data.
Q-Q Plots
The normal Q-Q plot almost exactly follows a straight 45 degree line indicating the quantiles of the distribution closely match the quantiles of the data. The extreme value minimum Q-Q plot shows serious deviations between quantiles and indicates a poor fit.
P-P Plots
The normal P-P plot follows a straight 45 degree line indicating a good fit. The extreme value minimum P-P plot deviates from the theoretical 45 degree line indicating a poorer fit than normal.
Goodness of Fit p-values
Calculating p-values for distributions with estimated parameters requires adjustments for the given distribution and not all distributions can have a p-value calculation. To avoid this, Simulation Master uses Monte Carlo simulation of p-values. This allows us to generate a p-value for all distributions. This is an extremely time consuming process and the best practice is to find the best fitting distribution, and then refit the distribution while simulating the desired p-values. Since we've already determined that the normal distribution is the best fit, we can generate p-values for each goodness of fit statistic.
We'll again fit the normal distribution to the data, but this time we will simulate p-values for each goodness of fit statistic. We'll use 500 simulations for each statistic. The fit results are shown below.
As we see above the p-values for the goodness of fit statistics are:
Anderson-Darling: 0.068
Kolmogorov-Smirnov: 0.248
Chi-Squared: 0.55
Let's say we had chosen a critical value of 0.05 beforehand (0.01 and 0.05 are common values). To not reject the null hypothesis that the data follows the normal distribution, p-values should be greater than 0.05. For K-S and Chi-Squared, p-values are much greater than 0.05 as we would expect. Notice that the A-D p-value is only slightly larger than the critical value even though the data was generated from the normal distribution. This illustrates that p-values should not be used as the only means of assessing distribution fit.
Summary
We've looked at formal and informal methods of assessing distribution fit, and they are listed below.
Formal Methods
- Information criteria.
- Goodness of fit statistics.
Informal Methods
- Statistical comparison of data to fitted distribution.
- Histogram with probability density function overlay of fitted distribution.
- Cumulative distribution chart.
- Q-Q plot.
- P-P plot.