Divide and Conquer

Comparing the results of my Fibonacci calculator in Haskell to those in Erlang revealed a few things.  First of all, there was a bug in my Haskell implementation!  The corrected version is:

import Data.List (genericIndex)
import Data.Bits
import System.Environment (getArgs)

-- simple Haskell demo to compute an arbitrarily large Fibonacci number.
--
-- this program relies primarily on the principle that F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
-- from which we find that F(2n) = F(n-1)*F(n) + F(n)*F(n+1) = F(n-1)*F(n) + F(n)*(F(n) + F(n-1))
-- Using the second equation, it is a simple matter to sequentially construct F(2^n) for any value n.
-- With this tool in hand, F(N) may be computed for any N by simply breaking N down into its constituent
-- 2^n components (e.g. N = 14 = 8 + 4 + 2 = 2^3 + 2^2 + 2^1) and to then use these results to guide
-- construction of the value we want by summing each of the F(2^n) using the F(n+k) identities described above

-- given two Fibonacci pairs representing F(n-1),F(n) and F(k-1),F(k), compute F(n+k-1), F(n+k)
fibSum :: (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
fibSum (fnminus1, fn) (fkminus1, fk) =
   (fnminus1*fkminus1 + fn*fk, fnminus1*fk+fn*(fkminus1+fk))

-- given an integer (representing the ordinal of the Fibonacci pair to be retrieved working value and a Fibonacci pair accumulator value,
-- compute return the sum of all Fibonacci values corresponding to the bits in N.  For example, given a value of ten for N in F(N) -
-- bit values: False (2^0 = 1), True (2^1 = 2), False (2^2 = 4), True (2^3 = 8) - compute the sum of F(2 + 8)
fibGenerator :: Integer -> (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
fibGenerator (n) tuple accumulator | (n == 0) = accumulator
                                   | ((n .&. 1) == 1) = fibGenerator (shiftR n 1) (fibSum tuple tuple) (fibSum tuple accumulator)
                                   | ((n .&. 1) == 0) = fibGenerator (shiftR n 1) (fibSum tuple tuple) accumulator

-- given a positive integer value N, compute and return Fib(N)
fib :: Integer -> Integer
fib (n) | n  0  = snd ( fibGenerator (n - 1) (0, 1) (0, 1))

-- main :: reads our argument and computes then displays corresponding Fibonacci number
main = do
   args

Second of all, there was a significant difference in the performance of the two implementations.  Since one was an almost direct port of the other, it was unlikely that the cause was in the code so it must be something inherent in the languages.

Haskell uses something called “non-strict” evaluation, meaning essentially that results of expressions are evaluated when they are needed and expressions whose results are never needed are never evaluated.  In contrast, Erlang uses strict evaluation, meaning that each expression – whether its result is used or not – is computed in its entirety and the value stored.  Of course non-strict evaluation can make certain types of operations more efficient but this is unlikely to make a big difference in the actual performance of the two implementations because the computation of Fibonacci numbers is largely a cpu-bound operation with few, if any, steps that can be skipped.

How big a difference is there between the two?  Well, on my test virtual machine, the Erlang implementation computes Fib_{1000000} in about 3.5 seconds (and another six seconds to display it) while the Haskell implementation computes and displays Fib_{1000000} almost instantaneously.  Taking it one order of magnitude further: the Erlang implementation computes Fib_{10000000} in eight minutes while the Haskell implementation computes the same value and writes it to /dev/null in only two seconds.

Obviously, the difference between the two has to do with optimizations in the Haskell Integer library that are not present in the Erlang Integer library.  There is also the possibility that the fact that Haskell is compiled and Erlang runs in its own virtual machine but I doubt this makes much of a difference.  It’s fair to point out that Erlang has a principally business computing background while Haskell appears to me to have lived a long life in academic circles before entering the commercial world.  It wouldn’t be the least bit surprising to find that the Haskell Integer library has received a little more attention.

But something interesting begins to emerge as larger and larger values are tried.  Once it gets started, Erlang displays results at a very steady (and very swift) pace – pretty much what you would expect.  Haskell, on the other hand, begins to display results in bursts of digits with larger numbers.  For example, Fib_{100000000} begins displaying in about 10 seconds and seems to pause every three seconds or so, sometimes for a barely noticeable fraction of a second and sometimes for ten or twenty seconds.

While it’s possible that this has to do with memory management in Haskell, I don’t believe it is.  After all, the resulting number has only about 20 million digits while the virtual machine has 2 GB allocated.  There isn’t really that much memory to manage.

Looking closely at the cpu profile makes the memory management interpretation seem even less likely.  A single processor core ascends to 100% as soon as the program is launched.  After about ten seconds, this core drops to about 20% while a few other cores also rise to 30%.  This roughly coincides with the time that the program begins to display its results.  After a relatively quiet period of five seconds, the original core rises once again to 100% (as all other cores idle down) and remains there for about ten seconds – during most of which time the display halts.  Finally, this core drops once again to about 20% and the output resumes.  For the rest of the run time, the load is largely distributed across all four cores allocated to my test virtual machine.

Clearly, the program is still hard at work as it displays the result but the question is: what is it doing?  As I suggested in my last post, I suspect that it is still computing the result.  All the evidence so far would seem to support this idea – but there is a catch:

To display the resulting Fibonacci number, the program must first convert it to a string in base 10.  The way this is typically done is through successive integer division: divide the number by 10 which yields a remainder between zero and nine representing the least significant digit and a quotient representing all remaining digits.  This approach can be successively applied to generate the digits one at a time starting with the least significant and ending with the most significant.

But if Haskell used this algorithm, it’s non-strict evaluation would be neutralized and it would have to perform every single calculation before it could display even the first digit.  We would expect to see the exact behavior that we see from Erlang – a long delay and then a rapid stream of digits whose speed was paced only by the write speed of the video card.  To get to the bottom of this puzzle, we have to look at Haskell’s algorithm to convert an Integer into a base 10 string.  The function that does this is appropriately named integerToString and begins on line 479 of Show.lhs.  This algorithm takes a completely different approach from the one described above and delivers two improvements: it is more efficient and it works better with Haskell’s non-strict evaluation.

The algorithm Haskell uses employs a “divide and conquer” strategy to reduce the original Integer into a List of Integer values by first repeatedly squaring 10^{18} (for the 64-bit version) until it finds the largest value that is less than the number to be converted.  It then divides the number by this result to yield a List of two Integer values: the quotient and the remainder.  This process is repeated until it has a List of Integers, each of which is less than 10^{18} – effectively representing the original number in base 10^{18}.  These resulting Integer values are successively converted to base 10 strings starting with the first and proceeding through to the last.

So, how is this faster?  Well, first it helps to think of an Integer value as a list of 64-bit integers – essentially digits in base 2^{64}.  Each time such a number is divided, the algorithm must divide every digit.  The traditional algorithm divides the entire number over and over again by 10, which reduces it by one order of magnitude each time.  The highest order digit need only be divided about 18 or 19 times but the next to the highest digit is divided between 36 and 38 times (really, a little more than this, but let’s not split hairs).  As we expand the series, we find that the sum of operations is roughly 19 \times n(n+1) where n is the number of 64-big “digits”.  In computer science terms, I believe this equates to O(n^2) time.

The Haskell algorithm involves first the repeated squaring to find the greatest power of 10^{18} that is less than the number to be reduced.  The number of operations for this step grows relative to the number of 64-bit digits at a rate of roughly \frac{ln(n)}{ln(2)} .  The next step is the repeated division to reduce the number into a list of base 10^{18} digits.  The number of operations for this phase grows at a rate of \frac{n}{2}.  The final step is to divide each of the resulting “digits” into base 10 digits – the number of operations for this step grows at a rate of roughly 19n .

So, based on the above, the total operations for presenting a number with the Haskell algorithm is \frac{ln(n)}{ln(2)} + \frac{n}{2} + 19n .  This reduces to simply O(n) – showing that the Haskell algorithm runs in linear time.  While its complexity may mean that it runs slower for smaller numbers (though I don’t think this is the case) it will rapidly outpace the performance of the typical algorithm, which runs in quadratic time, for any reasonably large Integer value.

To demonstrate this point for myself, I ported the Haskell algorithm to Erlang and gave it a try with my Fibonacci generator.  Naturally, it had no impact on the rate at which the numbers were generated and it was hampered by the performance of the Integer library in Erlang but, nevertheless, it proved to be twice as fast at displaying the result of Fib_{1000000}, coming in at only 3.6 seconds compared to 6.3 seconds for Erlang’s native conversion.

So this addresses how the Haskell algorithm is faster but what is it about this algorithm that makes it a better fit for non-strict evaluation?  Well, as I pointed out earlier, the traditional base conversion algorithm would completely neutralize Haskell’s non-strict evaluation so it would be hard for it to be a worse fit.  But let’s look at what steps Haskell must perform in order to find the first digit.

First, Haskell must perform all of the steps required to select a power of 10^{18} by which to reduce the number to be displayed.  This requires approximately \frac{ln(n)}{ln(2)} operations (where n is the number of 64-bit integers in our number) – this step is irreducible.  But we have sort of a shortcut here because we really only need to know the magnitude of the number we’re going to display at this point, not its precise value.  This means that most of the time we need only resolve the most significant 64-bit digit (and to know the order of that digit) of it in order for this comparison to work.

Next, Haskell must break down the number by successively reducing it into a quotient and remainder.  But here, we have a another shortcut – we are only concerned with the quotient, which means we can just park the remainder for now (in fact, I’m relatively certain that we don’t even need to calculate the low-order half of our Fibonacci number at this point).  So, instead of completing the full series of \frac{n}{2} operations to display the number, we only need to perform \frac{ln(n)}{ln(2)} operations.

Finally, we only need to convert the top-most base 10^{18} digit into base 10 using the traditional base conversion algorithm, requiring only about 19 operations and then return the top-most digit.

Taken together, presentation of the first digit requires only 2\frac{ln(n)}{ln(2)}+19n operations.  Not only is this less than the original 19n^2+n it is less even than the \frac{ln(n)}{ln(2)} + \frac{n}{2} + 19n operations in the improved algorithm.  So, even if Haskell used the same Integer library as Erlang, it would be able to begin displaying the first digits of the number almost immediately.  In fact, thanks to non-strict evaluation, Haskell probably wouldn’t even finish computing more than the first several 64-bit digits of the number before displaying the first base 10 digit.  So the odd pauses and cpu profile during the display of Haskell’s results very likely do point to the fact that Haskell is still computing the result as it displays it.

Why the obsession with Haskell, anyway?

Before I get into the subject of this post, let’s address a little business.  You may notice that my blog looks different today.  First of all, the url is different – halstclair.com instead of halstclair.com/TechnicalDetails.  The second, and, in my opinion, more important difference is that it is more responsive.  Let’s just say that my previous host wasn’t getting the job done so I’ve done some major rearranging behind the scenes and what you’re seeing is the result.

Now, on to the subject at hand…

A few years ago, I thought I’d look around at functional languages to see what they had to offer.  I’d seen a few articles pointing out that we were rapidly nearing the end of our ability to scale computing performance efficiently by increasing processor clock rates. Instead, we would be challenged over the coming years to make efficient use of increasing numbers of processor cores and that the secret to making this happen would be the kind of parallelization that was innate in functional languages. So it seemed like a good time to get acquainted with these languages.

At the time, Erlang and Haskell were fairly well established and languages like F# and Clojure (as well as others) were emerging.  I settled on Haskell primarily because a few references suggested that it was among the oldest and the most purely functional language among the lot (I don’t really want to debate this particular point because I don’t have enough experience with the others to form my own opinion but this was the impression I got).

Soon after starting to work with Haskell I thought I’d try something a little beyond the typical “hello world” so I started experimenting with a simple program to calculate Fibonacci numbers – give it an index, say 10, and it would produce the corresponding Fibonacci number, say 55.  Of course, the naive algorithm for computing Fibonacci numbers is just \sum_{i=1}^{n} F_{i} where F_{0}=0 and F_{1}=1 so it was easy enough to implement. Haskell has built-in support for arbitrary precision integer math so my first implementation could calculate essentially any Fibonacci number that I had the patience to wait for. The problem was that this algorithm was expensive so, as I experimented with computing higher and higher ordered Fibonacci numbers, it quickly came to be like watching paint dry.

That would have been the end of the story but…

Looking at the sequence one day, I noticed that F_{12} was 144. Now that’s a pretty odd coincidence (since 12^2 is also 144) and while it was obvious that there was no such relationship for any other Fibonacci number, it made me curious so I started doodling with squares of Fibonacci numbers. It wasn’t long before I stumbled across the fact that F^2_{6} + F^2_{5} = F_{11}. A little more mucking about revealed that this particular pattern (F^2_{n} + F^2_{n-1} = F_{2n-1}) persisted for every pair Fibonacci numbers that I cared to try. A quick look at Wikipedia (see Fibonacci Number) revealed that this was just a special case of a well-known identity: F_{2n+k} = F_{k}F^2_{n+1}+2F_{k-1}F_{n+1}F_{n}+F_{k-2}F^2_{n}

In relatively short order, I worked out the following useful variation: F_{n+k}=F_{n-1}F_{k}+F_{n}*F_{k+1}. This formula provided a way to rapidly calculate the series F_{n} for n \in (1,2,4,8,16,32...) and to compute the ‘sum’ of any combination of these values to generate any arbitrary Fibonacci number (i.e. F_{10} = F_{2+8}).

The source for the resulting program was:

import Data.List (genericIndex)
import System.Environment (getArgs)

-- this set of functions can be used to calculate an arbitrarily large Fibonacci number.
-- it relies ultimately on the principle that F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
-- from here we find that F(2n) = F(n-1)*F(n) + F(n)*F(n+1) = F(n-1)*F(N) + F(n)*(F(n) + F(n-1))
-- using the second equation, it is a simple matter to construct an infinite list of Fibonacci numbers, see fibPwr2Pairs, F(Xn) where Xn = 2^n
-- Then, to arrive at F(n) where n is an arbitrary positive integer, we decompose n into its constituent bits and find the Fibonacci number pairs
-- corresponding to each of the bit values (e.g. given n=5 we have bit 0 = True, bit 1 = False, bit 2 = True - we therefore find the pairs (0, 1) and
-- (2, 3))  Finally we use the first principle to transform these pairs inot the Fibonacci pair corresponding to the sum of the bit values.

growAPair :: (Integer, Integer) -> (Integer, Integer)   -- given a Fibonacci pair (F(n-1), F(n)) yields the pair for 2*n i.e. (F(2*n-1), F(2*n))
growAPair (fnminus1, fn) = (fnminus1^2 + fn^2, fn * (2*fnminus1 + fn))

-- fibPwr2Pairs :: yields an infinite list of the Fibonacci pairs occuring at powers of two
-- (This works out well for us because Haskell uses lazy evaluation! Members of the list are only calculated as they are needed)
-- i.e. [F(2^0) = (1, 0), F(2^1) = (1, 1), F(2^2) = (2, 3), ... F(2^n) = (F(2^n), F(2^n - 1))]
fibPwr2Pairs :: [(Integer, Integer)]
fibPwr2Pairs = iterate growAPair (0,1)

-- binaryDecompose :: decomposes the supplied integer value into its constituent bits and returns this as a list of Bool values with the
-- least significant bit first in the list.
binaryDecompose :: Integer -> [Bool]
binaryDecompose n = map odd                     -- transforms the list of integers into boolean values indicating whether each original integer was odd
                 (takeWhile (/= 0)              -- limits the list of integers to those whose value is non-zero - otherwise it would have an infinite number of trailing zeros
                (iterate                        -- apply the following function to build an infinite list of n divided by increasing powers of two (starting with 2^0)
                  (\value -> quot value 2)      -- lambda function that accepts a value and returns that value divided by 2 (as the first argument to 'iterate')
                        n))                     -- the number to be decomposed (as the second argument to 'iterate')

-- fibSum :: this function takes an array of fibonacci pairs in the form (F(Xn-1), F(Xn)) and calculates the Fibonacci number for the sum of all Xn
-- i.e. for the array [FibPair F(1), FibPair F(2), FibPair F(8)] the result of fibSum would be FibPair F(1+2+8) = F(11)
-- this relies on the simple fact that F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
fibSum :: [(Integer, Integer)] -> (Integer, Integer)
fibSum ((a, b) : []) = (a, b)
fibSum ((fibOfAMinus1, fibOfA) : (fibOfBMinus1, fibOfB) : xs) = fibSum ((fibOfAMinus1 * fibOfBMinus1 + fibOfA*fibOfB, fibOfAMinus1 * fibOfB + fibOfA * (fibOfBMinus1 + fibOfB)) : xs)

-- tupleFilter :: this function is a simple filter function used by filteredFibPwr2Pairs to eliminate Fib pairs corresponding to bits whose value is zero
-- This may be unnecessary but I could not get the Lambda syntax to work out.  In any event, it's probably a little clearer broken out like this.
tupleFilter :: (Bool, (Integer, Integer)) -> Bool
tupleFilter (bool, (a, b)) = bool

-- filteredFibPwr2Pairs :: this function accepts a value n and generates the set of Fibonacci pairs needed to calculate F(n)
-- this is where our infinite list of Fibonacci pairs joins our decomposed n to yield the Fibonacci pairs required to generate our result
filteredFibPwr2Pairs :: Integer -> [(Integer, Integer)]
filteredFibPwr2Pairs n = snd (unzip (filter tupleFilter (zip (binaryDecompose n) fibPwr2Pairs)))

-- fib :: calculate an arbitrary Fibonacci number
fib :: Integer -> Integer
fib n = snd (fibSum (filteredFibPwr2Pairs n))

strToInteger :: String -> Integer
strToInteger = read

-- main :: reads our first argument as the Fibonacci number to be calculated.
main = do
   args

I’m not pretending that this was beautiful Haskell code – it was just one of my early efforts. But here is the interesting thing: when I ran this program for large values of n and routed the output to a file, Haskell used all of the RAM I allowed it to have (and multiple processor cores if I remember correctly). The processor load went to 100% as soon as I hit enter and stayed there as it began to produce output. Later, after perhaps 25% of the digits had been generated, the processor load would drop to perhaps 25% to 40% and remain there until all of the output had been produced.

I’m going out on a limb here but the way that I interpreted these results was that the application was displaying the result even as it was still working to compute it.  Now this isn’t a particularly important behavior for most applications, since most applications don’t involve a lot of arbitrary precision math but, if my interpretation were right, it pointed to a deep kind of “laziness” that inherent in Haskell’s non-strict evaluation that I found very interesting.

To investigate this further, I rewrote this application in Erlang recently (and in Java, just for contrast) and compared the performance of these applications.  Surprisingly, I found that my Haskell implementation was dramatically faster, producing the one millionth Fibonacci number in under a second while Erlang took nearly ten seconds (to be fair, I ran my Erlang version in the Erlang equivalent to Haskell’s ghci but the Haskell version produces near-instantaneous results ghci, too).  To be certain that I was looking at real results, I rewrote the Haskell version so that the logic was nearly identical between the final Haskell and Erlang versions and still saw the same result.

Here is the source for the two resulting programs:

Erlang

-module(fib).
-export([fib/1, fibSum/2, fibGenerator/3]).

% fibSum -- this function takes two fibonacci pairs in the form (F(n - 1), F(n), F(k - 1), F(k)) and caculates
% the Fibonacci number for F(n+k).  I.e. for the two pairs (F(1), F(2)), (F(3), F(4))
% the result of fibSum would be FibPair F(2+4) = F(6)
% this relies on application of the formula F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
fibSum({FibOfAMinus1, FibOfA}, {FibOfBMinus1, FibOfB}) ->
   {FibOfAMinus1*FibOfBMinus1 + FibOfA*FibOfB, FibOfAMinus1*FibOfB + FibOfA*(FibOfBMinus1+FibOfB)}.

% given an array of booleans (representing the bits of N in F(N), LSB first), a Fibonacci pair working value and a Fibonacci pair accumulator value,
% compute return the sum of all Fibonacci values corresponding to the bits in N.  For example, given a value of ten for N in F(N) -
% bit values: False (2^0 = 1), True (2^1 = 2), False (2^2 = 4), True (2^3 = 8) - compute the sum of F(2 + 8)
fibGenerator(0, _, Accumulator) -> Accumulator;
fibGenerator(N, FibPair, Accumulator) when (N band 1 =:= 1) ->
   fibGenerator(N bsr 1, fibSum(FibPair, FibPair), fibSum(FibPair, Accumulator));
fibGenerator(N, FibPair, Accumulator) ->
   fibGenerator(N bsr 1, fibSum(FibPair, FibPair), Accumulator).

% given a positive integer value N, compute the Fibonacci number Fib(N)
fib(N) when (N =:= 0) -> 0;
fib(N) when (N =:= 1) -> 1;
fib(N) when (N > 0) ->
   {_,Result} = fibGenerator(N-1, {0,1}, {0,1}),
   Result;
fib(_) -> ok.

Haskell

import Data.List (genericIndex)
import Data.Bits
import System.Environment (getArgs)

-- simple Haskell demo to compute an arbitrarily large Fibonacci number.
--
-- this program relies primarily on the principle that F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
-- from which we find that F(2n) = F(n-1)*F(n) + F(n)*F(n+1) = F(n-1)*F(n) + F(n)*(F(n) + F(n-1))
-- Using the second equation, it is a simple matter to sequentially construct F(2^n) for any value n.
-- With this tool in hand, F(N) may be computed for any N by simply breaking N down into its constituent
-- 2^n components (e.g. N = 14 = 8 + 4 + 2 = 2^3 + 2^2 + 2^1) and to then use these results to guide
-- construction of the value we want by summing each of the F(2^n) using the F(n+k) identities described above

-- given two Fibonacci pairs representing F(n-1),F(n) and F(k-1),F(k), compute F(n+k-1), F(n+k)
fibSum :: (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
fibSum (fnminus1, fn) (fkminus1, fk) =
   (fnminus1*fkminus1 + fn*fk, fnminus1*fk+fn*(fkminus1+fk))

-- given an integer (representing the ordinal of the Fibonacci pair to be retrieved working value and a Fibonacci pair accumulator value,
-- compute return the sum of all Fibonacci values corresponding to the bits in N.  For example, given a value of ten for N in F(N) -
-- bit values: False (2^0 = 1), True (2^1 = 2), False (2^2 = 4), True (2^3 = 8) - compute the sum of F(2 + 8)
fibGenerator :: Integer -> (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
fibGenerator (n) tuple accumulator | (n == 0) = accumulator
                                   | ((n .&. 1) == 1) = fibGenerator (shiftR n 1) (fibSum tuple tuple) (fibSum tuple accumulator)
                                   | ((n .&. 1) == 0) = fibGenerator (shiftR n 1) (fibSum tuple tuple) (fibSum tuple accumulator)

-- given a positive integer value N, compute and return Fib(N)
fib :: Integer -> Integer
fib (n) | n0  = snd ( fibGenerator (n - 1) (0, 1) (0, 1))

-- main :: reads our argument and computes then displays corresponding Fibonacci number
main = do
   args

From a computing perspective, there is something deeply non-intuitive about displaying a result that we are still calculating but after mulling it over I think that this may not be quite so strange as it seems. Irrespective of whether this is what Haskell is actually doing, I’ll try tomorrow to address why I think this might actually be possible and that a language like Haskell might be the perfect environment to make it happen.