March 2015

# Binomial Probilities

By Roger A. Kaiser Jr. Mary Pat Campbell’s January 2014 CompAct article, “ Variations on Approximation – An Exploration in Calculation,”, brought to mind encounters that I’ve had in calculating binomial probabilities. In this article we’ll look at:

• How Excel calculates binomial probabilities,
• How to calculate binomial probabilities to high precision using the multiplication method, and
• How accurate the various methods are.

How Excel Calculates Binomial Probabilities

It was a dark and stormy night. I was attempting to code a binomial lattice in SunGard’s™ iWorks Prophet Professional™ (Prophet) system. The specification was a prototype model coded in Microsoft Excel. Unfortunately, Prophet did not have a function to calculate binomial probabilities. No problem, I’ll just roll my own. My first attempt multiplied the number of combinations by exponentiated probabilities. This worked well, except that the results did not exactly match the prototype. The results were close, but my internal client wouldn’t be satisfied unless the results matched exactly.

As you can guess from Campbell’s article, with my first attempt, the exponentiated probabilities underflowed (and the combinations overflowed).

I needed to replicate the Excel BINOMDIST (also known as Binom.Dist) function. How exactly does Excel BINOMDIST work?

Fortunately Microsoft documented how BINOMDIST works in a Microsoft Support article titled, “Excel statistical functions: BINOMDIST,” available at http://support.microsoft.com/kb/827459.

Here is my interpretation of what the article says: For a large number of trials, BINOMDIST calculates binomial probabilities by considering each probability as a rational number with the same, as yet unknown, denominator. It starts at a mode, which is the number of trials that has the highest probability. The numerator of the probability of the mode is arbitrarily assigned a value of 1.0. Then, the recursive version of the binomial formula is applied in each direction. That is, working from the mode downward toward zero successes and also working from the model upward toward all successes. Once the numerators of the probabilities are below some small threshold, the algorithm stops calculating numerators.

Using the fact that probabilities must add up to one, the algorithm then calculates the denominator as the sum of the numerators of the probabilities. To get a decimal probability, they divide each numerator, or at least the ones needed, by that denominator.

Brilliant!

As Mr. Miyagi said in The Karate Kid, “Best block, no be there.” By this technique we avoid calculating the troublesome combination coefficients, we reduce the number of multiplications of base probabilities (p’s and q’s, if you will), and we get fairly accurate results. We’ll see how accurate later.

One unusual thing about this algorithm is that it calculates all of the binomial probabilities for a given trial size. This could have a negative performance impact. However, someone interested in one binomial probability is likely interested in another with the same basic parameters. For a binomial lattice this was the case. Caching the array of values tends to reduce the amortized cost of a call to BINOMDIST.

With a slight twist, I applied the same technique using a Prophet “extended formula.” A Prophet extended formula is similar to a Visual Basic for Application (VBA) function.

The twist was that instead of setting the numerator of the probability of the mode to one, I set it to a huge value. By doing so, the numerators have a larger dynamic range, and we might be able to calculate a few more extreme probabilities before encountering underflow. (Note to Microsoft: please send me a T-shirt if you like this idea.)

The code adds the numerators from smallest to largest to attempt to minimize errors.

You can view the Prophet Binomial extended formula here.

With this calculation, I was able to calculate binomial probabilities that matched, or exceeded, the quality of those produced by Excel. The dark and stormy night ended.

The Multiplication Method

Another way to calculate binomial probabilities is to simply multiply together all the factors. If you choose an order that keeps the partial-product near one, then you are less likely to encounter an underflow or an overflow.
Catherine Loader, in appendix B of her July 9, 2000 paper, “ Fast and Accurate Computation of Binomial Probabilities,” presents a multiplication algorithm in the C programming language.

I have translated this into Common Lisp (lisp). We’ll briefly discuss how it works. Don’t panic if it looks foreign:

1. (defun dbinom-mult (x n p)

2. (if (> (+ x x) n)

3.   (dbinom-mult (- n x) n (- 1 p))

4.   (handler-case

5.     (do ((j0 0) (j1 0) (j2 0) (f 1))

6.        ((and (>= j0 x) (>= j1 x) (>= j2 (- n x))) f)

7.       (if (and ( < j0="" x)="" /> < f="" />

8.            (setf j0 (1+ j0)

9.               f (* f (/ (+ (- n x) j0) j0)))

10.           (if ( < j1="" />

11.             (setf j1 (1+ j1)

12.                  f (* f p))

13.             (setf j2 (1+ j2)

14.                  f (* f (- 1 p))))))

15.    (floating-point-underflow () 'Underflow)

16.    (floating-point-overflow () 'Overflow)))) [JLW1]

A file with this function and some utility functions is here.

In lisp, parenthesized expressions are of the format (operator operand-1 operand-2 … operand-n). So parentheses are significant in lisp. In addition, lisp programmers tend to use hyphens to separate parts of names rather than underscores. Put another way, there is little to lisp syntax other than parentheses!

In line 1, this code defines a function that takes three parameters: x, the number of successes; n, the number of trials; and p, the probability of a single trial success. In line 2, if 2x is greater than n, we call ourselves with alternate parameters to reduce the number of multiplications.

Unlike the “C” code, I added an event handler to catch overflow and underflow events. Excel does not catch underflow events, so I wanted to demonstrate that on an IEEE-754 compliant system, that it is possible to catch them. IEEE-754 is the floating-point specification that most computers have adopted.

Line 5 initializes loop variables j0, j1 and j2 to zero. It also initializes our partial-product to one. Line 6 is the loop termination condition. We are done when all of these are true: j0 is greater than or equal to x, j1 is greater than or equal to x, and j2 is greater than or equal to n-x.

In Line 7, if j0 is less than x and f is less than one, then we set two things. In line 8 we set j0 to j0 + 1 and in line 9 we set f to f * (n – x + j0) / j0.

In line 10, if j1 is less than x, we increment j1 in line 11 and multiply f by p in line 12.

Otherwise, in line 13, we increment j2 and multiply f by (1-p) in line 14.

If an underflow occurs, line 15 catches it and returns the symbol Underflow. There is nothing special about that symbol, we could return anything, but I wanted to show that we could catch it and return something other than zero.

Similarly, line 16 catches overflow, but I don’t think this will ever happen [JLW2] .

I used “clisp,” freely available at http://clisp.org, to run this program.

One interesting thing about this lisp program is that it works for a variety of numeric formats. For example, we can call it like this:

(dbinom-mult 2 5 .125)

and lisp replies:

0.10467529

We can call it like this (the “d” is like the “e” in computer exponential notation, but stands for double-precision).

(dbinom-mult 2 5 0.125d0)

and lisp replies with a double precision answer:

0.10467529296875d0

Bah. Humbug. That’s not very interesting.

How about this? We choose how many bits of precision we want for a long-floating point format. Here, we choose 1024 bits:

(setf (ext:long-float-digits) 1024)

and lisp boringly replies:

1024

The base-2 exponent for clisp long-floats is held in a 32-bit word and so is “limited” to two raised to about the 2 billionth power.

Using the binomial parameters from Campbell’s paper, ask for the probability of 1000 deaths out of 2000 with a q of 0.00146:

(dbinom-mult 1000 2000 0.00146L0)

and lisp replies:

1.07074147070176044569673055560919985074589451123135122572960
30875279864361118283003498336795411419402627560550727975264
2378724765407695510702249772421805456584633596983252984973836
2027942105698947643712932142954273531128364169877683285209993
097848269750287457068578437635103798036931642388233710558478
1153561L-2236

But, that’s still an approximation. Do you want the exact value? If so, lisp can do rational arithmetic. Pose:

(dbinom-mult 2 5 1/8)

and lisp gives the exact answer:

1715/16384

We can ask it for the probability of 1000 deaths out of 2000 with a death probability of 146/100000:

(dbinom-mult 1000 2000 146/100000)

The reply is a large number of digits. You can find them here.

The same lisp code can catch exceptions. Using single precision floating point numbers, if we ask lisp:

(dbinom-mult 47 2000 0.00146S0)

lisp replies:

UNDERFLOW

The same program calculated each of these. The type of results depended on the type of input.

How Accurate are the Various Methods?

Now that we have a way to calculate values to almost any precision, we can compare the accuracy of various methods of calculating binomial probabilities.

The following discussion refers to this  Excel workbook.

This workbook continues the example used in Campbell’s article. That is, computing the probability of k deaths among 2000 lives where the individual probability of death in the year is 0.00146.

Sheet “CombXProb” computes the probabilities by multiplying the number of combinations by the exponentiated probabilities. As Campbell’s article showed and as happened to me in my first Prophet attempt, this method encounters underflow. After k=108 the probabilities show as zero. This is not because Excel cannot represent smaller probabilities. Rather it is because of the method of calculation.

Sheet “Binom.Dist” shows results from the Excel function. This calculates non-zero probabilities up to k=210. Is this the gold standard for a binomial probability function?

Sheet “StdRecursion” calculates the probability of zero deaths and then applies a recursive formula to calculate the next probability. This calculates non-zero probabilities up to k=210.

Sheet “Stirling” is based on Campbell’s article. It calculates the log probability based on a Stirling approximation. Columns “D” and “E” convert the log probability to a separate significand and exponent. By this technique we can show numbers larger than those representable in IEEE 754 double precision.

Sheet “ModeOutRecursive” shows my implementation based on the Microsoft support article. The mode occurs at k=2 and I set that to a huge number divided by the number of trials, 2000 in this case. You could choose a larger number if you sharpened the bounds, but dividing by 2000 is an easy way to know that our numerators will not total to more than the largest double precision number.

Aside: In forming the denominator, I first sorted the numerators and then applied Kahan’s summation algorithm to the sorted numbers. There was a slight difference between summing the unsorted numbers and summing from smallest to largest. Kahan’s summation algorithm applied to the sorted numbers gave the same results as just plain summing the numbers.

Sheet “ExtendedModeOutRecursive” shows how to track the exponent of the probability separately so that we can increase the range. For ease of reading, I used base 10. Since IEEE 754 is base 2, keeping the additional exponent as a power of two would allow the calculation to be performed without round-off from the division. Indeed, IEEE 754 calls for an overflow exception handler that can help with this calculation. Here, we just manually factored out the exponent. This allows values to be calculated up to the number of trials (2000).

Sheet “Lisp1024bitPrecision” shows results from the lisp code with the multiplication method. The excess precision is reduced to the amount that fits in a double precision number.

Finally, the saddle point method from Catherine Loader’s paper is shown. This is an interesting method that uses some algebraic manipulations to be able to calculate binomial probabilities for extremely large trials. How large? Her opening example is calculating the probability of one million successes out of two million. The method attempts to eliminate catastrophic cancellation and also uses a Stirling-De Moivre series. I coded it in lisp here.

Sheet “MethodProbabilities” is a summary of calculated probabilities up to the limits of a double. This occurs at k = 210. Since it is hard in Excel to compare larger numbers (we could use logs or factor), our comparison focuses on the smaller number of successes.

Finally, sheet “Accuracy” compares some of the methods with the results from the high-precision multiplication method. Following Loader, we will use –Log(Abs(method_n / multiplication_method) – 1). This essentially counts the number of matching places. If all the places match, we assign a value of 15.65 matching places. This is based on the number of decimal places that it would take to represent 53 bits of precision assuming that the first bit is always a one because Excel does not use a portion of IEEE 754 floating-point numbers known as “denormals.” Assuming the first bit is always a one effectively leaves 52 bits. Two raised to the 52nd is about the same as 10 raised to the 15.65.

So, how did the various methods fare? The top row shows the sums of number of digits matching the multiplication method. Note that we limited the comparison to 210 successes so that we could use standard Excel formulas for the comparisons. The multiplication method surprisingly matches itself perfectly. Next comes the mode-outward recursive formula. This is essentially the Excel method, except that we don’t stop generating probabilities at a small threshold, we start with a large numerator, rather than start at 1.0, and we took some pains in totaling the numerators. I think these might have made the results just a little bit better than the Excel function, which ranks next.

The standard recursion method (starting from k = 0) came in next. Any inaccuracies in the probability of zero successes propagate to the other probabilities.

The method of combinations times exponentiated probabilities comes next. It did not fare well because it underflowed at k = 109.

Bringing up the rear are the saddle point method and the Stirling approximation method. The Stirling method could be improved with more terms. As is, adjusting for using Excel for the first 15 “Stirling” probabilities, the saddle point method yields about twice as many matching digits as does the Stirling.

Here is a summary of the comparison:

 Method Sum of Matching Places % of Total Possible Matching Multiplication 3302.90 100% Mode-Outward Recursive 3173.49 96% BINOMDIST (Excel) 2916.77 88% Standard Recursion 2801.87 85% Combination x ExpProbs 1447.79 44% Saddle Point 1130.81 34% Stirling Approximation 818.98 25%

Conclusion

In conclusion, in terms of accuracy alone, multiplying the factors gives the best results if you have the time. You can extend the precision by scaling the result to keep it from underflowing. The outward-recursive method documented in the Microsoft Support note is very impressive. It is accurate and easy to implement. The Stirling and saddle point methods have niche uses, but the lack of precision may rule them out for many applications.

References

Campbell, Mary Pat, “ Variations on Approximation – An Exploration in Calculation”, January 2014 CompAct.

Kahan, William (January 1965), "Further remarks on reducing truncation errors",  Communications of the ACM 8 (1): 40,  doi: 10.1145/363707.363723
Loader, Catherine, “ Fast and Accurate Computation of Binomial Probabilities”, July 9, 2000.

Microsoft, “Excel statistical functions: BINOMDIST”, http://support.microsoft.com/kb/827459.