Why the obsession with Haskell, anyway?

Before I get into the subject of this post, let’s address a little business.  You may notice that my blog looks different today.  First of all, the url is different – halstclair.com instead of halstclair.com/TechnicalDetails.  The second, and, in my opinion, more important difference is that it is more responsive.  Let’s just say that my previous host wasn’t getting the job done so I’ve done some major rearranging behind the scenes and what you’re seeing is the result.

Now, on to the subject at hand…

A few years ago, I thought I’d look around at functional languages to see what they had to offer.  I’d seen a few articles pointing out that we were rapidly nearing the end of our ability to scale computing performance efficiently by increasing processor clock rates. Instead, we would be challenged over the coming years to make efficient use of increasing numbers of processor cores and that the secret to making this happen would be the kind of parallelization that was innate in functional languages. So it seemed like a good time to get acquainted with these languages.

At the time, Erlang and Haskell were fairly well established and languages like F# and Clojure (as well as others) were emerging.  I settled on Haskell primarily because a few references suggested that it was among the oldest and the most purely functional language among the lot (I don’t really want to debate this particular point because I don’t have enough experience with the others to form my own opinion but this was the impression I got).

Soon after starting to work with Haskell I thought I’d try something a little beyond the typical “hello world” so I started experimenting with a simple program to calculate Fibonacci numbers – give it an index, say 10, and it would produce the corresponding Fibonacci number, say 55.  Of course, the naive algorithm for computing Fibonacci numbers is just \sum_{i=1}^{n} F_{i} where F_{0}=0 and F_{1}=1 so it was easy enough to implement. Haskell has built-in support for arbitrary precision integer math so my first implementation could calculate essentially any Fibonacci number that I had the patience to wait for. The problem was that this algorithm was expensive so, as I experimented with computing higher and higher ordered Fibonacci numbers, it quickly came to be like watching paint dry.

That would have been the end of the story but…

Looking at the sequence one day, I noticed that F_{12} was 144. Now that’s a pretty odd coincidence (since 12^2 is also 144) and while it was obvious that there was no such relationship for any other Fibonacci number, it made me curious so I started doodling with squares of Fibonacci numbers. It wasn’t long before I stumbled across the fact that F^2_{6} + F^2_{5} = F_{11}. A little more mucking about revealed that this particular pattern (F^2_{n} + F^2_{n-1} = F_{2n-1}) persisted for every pair Fibonacci numbers that I cared to try. A quick look at Wikipedia (see Fibonacci Number) revealed that this was just a special case of a well-known identity: F_{2n+k} = F_{k}F^2_{n+1}+2F_{k-1}F_{n+1}F_{n}+F_{k-2}F^2_{n}

In relatively short order, I worked out the following useful variation: F_{n+k}=F_{n-1}F_{k}+F_{n}*F_{k+1}. This formula provided a way to rapidly calculate the series F_{n} for n \in (1,2,4,8,16,32...) and to compute the ‘sum’ of any combination of these values to generate any arbitrary Fibonacci number (i.e. F_{10} = F_{2+8}).

The source for the resulting program was:

import Data.List (genericIndex)
import System.Environment (getArgs)

-- this set of functions can be used to calculate an arbitrarily large Fibonacci number.
-- it relies ultimately on the principle that F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
-- from here we find that F(2n) = F(n-1)*F(n) + F(n)*F(n+1) = F(n-1)*F(N) + F(n)*(F(n) + F(n-1))
-- using the second equation, it is a simple matter to construct an infinite list of Fibonacci numbers, see fibPwr2Pairs, F(Xn) where Xn = 2^n
-- Then, to arrive at F(n) where n is an arbitrary positive integer, we decompose n into its constituent bits and find the Fibonacci number pairs
-- corresponding to each of the bit values (e.g. given n=5 we have bit 0 = True, bit 1 = False, bit 2 = True - we therefore find the pairs (0, 1) and
-- (2, 3))  Finally we use the first principle to transform these pairs inot the Fibonacci pair corresponding to the sum of the bit values.

growAPair :: (Integer, Integer) -> (Integer, Integer)   -- given a Fibonacci pair (F(n-1), F(n)) yields the pair for 2*n i.e. (F(2*n-1), F(2*n))
growAPair (fnminus1, fn) = (fnminus1^2 + fn^2, fn * (2*fnminus1 + fn))

-- fibPwr2Pairs :: yields an infinite list of the Fibonacci pairs occuring at powers of two
-- (This works out well for us because Haskell uses lazy evaluation! Members of the list are only calculated as they are needed)
-- i.e. [F(2^0) = (1, 0), F(2^1) = (1, 1), F(2^2) = (2, 3), ... F(2^n) = (F(2^n), F(2^n - 1))]
fibPwr2Pairs :: [(Integer, Integer)]
fibPwr2Pairs = iterate growAPair (0,1)

-- binaryDecompose :: decomposes the supplied integer value into its constituent bits and returns this as a list of Bool values with the
-- least significant bit first in the list.
binaryDecompose :: Integer -> [Bool]
binaryDecompose n = map odd                     -- transforms the list of integers into boolean values indicating whether each original integer was odd
                 (takeWhile (/= 0)              -- limits the list of integers to those whose value is non-zero - otherwise it would have an infinite number of trailing zeros
                (iterate                        -- apply the following function to build an infinite list of n divided by increasing powers of two (starting with 2^0)
                  (\value -> quot value 2)      -- lambda function that accepts a value and returns that value divided by 2 (as the first argument to 'iterate')
                        n))                     -- the number to be decomposed (as the second argument to 'iterate')

-- fibSum :: this function takes an array of fibonacci pairs in the form (F(Xn-1), F(Xn)) and calculates the Fibonacci number for the sum of all Xn
-- i.e. for the array [FibPair F(1), FibPair F(2), FibPair F(8)] the result of fibSum would be FibPair F(1+2+8) = F(11)
-- this relies on the simple fact that F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
fibSum :: [(Integer, Integer)] -> (Integer, Integer)
fibSum ((a, b) : []) = (a, b)
fibSum ((fibOfAMinus1, fibOfA) : (fibOfBMinus1, fibOfB) : xs) = fibSum ((fibOfAMinus1 * fibOfBMinus1 + fibOfA*fibOfB, fibOfAMinus1 * fibOfB + fibOfA * (fibOfBMinus1 + fibOfB)) : xs)

-- tupleFilter :: this function is a simple filter function used by filteredFibPwr2Pairs to eliminate Fib pairs corresponding to bits whose value is zero
-- This may be unnecessary but I could not get the Lambda syntax to work out.  In any event, it's probably a little clearer broken out like this.
tupleFilter :: (Bool, (Integer, Integer)) -> Bool
tupleFilter (bool, (a, b)) = bool

-- filteredFibPwr2Pairs :: this function accepts a value n and generates the set of Fibonacci pairs needed to calculate F(n)
-- this is where our infinite list of Fibonacci pairs joins our decomposed n to yield the Fibonacci pairs required to generate our result
filteredFibPwr2Pairs :: Integer -> [(Integer, Integer)]
filteredFibPwr2Pairs n = snd (unzip (filter tupleFilter (zip (binaryDecompose n) fibPwr2Pairs)))

-- fib :: calculate an arbitrary Fibonacci number
fib :: Integer -> Integer
fib n = snd (fibSum (filteredFibPwr2Pairs n))

strToInteger :: String -> Integer
strToInteger = read

-- main :: reads our first argument as the Fibonacci number to be calculated.
main = do
   args

I’m not pretending that this was beautiful Haskell code – it was just one of my early efforts. But here is the interesting thing: when I ran this program for large values of n and routed the output to a file, Haskell used all of the RAM I allowed it to have (and multiple processor cores if I remember correctly). The processor load went to 100% as soon as I hit enter and stayed there as it began to produce output. Later, after perhaps 25% of the digits had been generated, the processor load would drop to perhaps 25% to 40% and remain there until all of the output had been produced.

I’m going out on a limb here but the way that I interpreted these results was that the application was displaying the result even as it was still working to compute it.  Now this isn’t a particularly important behavior for most applications, since most applications don’t involve a lot of arbitrary precision math but, if my interpretation were right, it pointed to a deep kind of “laziness” that inherent in Haskell’s non-strict evaluation that I found very interesting.

To investigate this further, I rewrote this application in Erlang recently (and in Java, just for contrast) and compared the performance of these applications.  Surprisingly, I found that my Haskell implementation was dramatically faster, producing the one millionth Fibonacci number in under a second while Erlang took nearly ten seconds (to be fair, I ran my Erlang version in the Erlang equivalent to Haskell’s ghci but the Haskell version produces near-instantaneous results ghci, too).  To be certain that I was looking at real results, I rewrote the Haskell version so that the logic was nearly identical between the final Haskell and Erlang versions and still saw the same result.

Here is the source for the two resulting programs:

Erlang

-module(fib).
-export([fib/1, fibSum/2, fibGenerator/3]).

% fibSum -- this function takes two fibonacci pairs in the form (F(n - 1), F(n), F(k - 1), F(k)) and caculates
% the Fibonacci number for F(n+k).  I.e. for the two pairs (F(1), F(2)), (F(3), F(4))
% the result of fibSum would be FibPair F(2+4) = F(6)
% this relies on application of the formula F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
fibSum({FibOfAMinus1, FibOfA}, {FibOfBMinus1, FibOfB}) ->
   {FibOfAMinus1*FibOfBMinus1 + FibOfA*FibOfB, FibOfAMinus1*FibOfB + FibOfA*(FibOfBMinus1+FibOfB)}.

% given an array of booleans (representing the bits of N in F(N), LSB first), a Fibonacci pair working value and a Fibonacci pair accumulator value,
% compute return the sum of all Fibonacci values corresponding to the bits in N.  For example, given a value of ten for N in F(N) -
% bit values: False (2^0 = 1), True (2^1 = 2), False (2^2 = 4), True (2^3 = 8) - compute the sum of F(2 + 8)
fibGenerator(0, _, Accumulator) -> Accumulator;
fibGenerator(N, FibPair, Accumulator) when (N band 1 =:= 1) ->
   fibGenerator(N bsr 1, fibSum(FibPair, FibPair), fibSum(FibPair, Accumulator));
fibGenerator(N, FibPair, Accumulator) ->
   fibGenerator(N bsr 1, fibSum(FibPair, FibPair), Accumulator).

% given a positive integer value N, compute the Fibonacci number Fib(N)
fib(N) when (N =:= 0) -> 0;
fib(N) when (N =:= 1) -> 1;
fib(N) when (N > 0) ->
   {_,Result} = fibGenerator(N-1, {0,1}, {0,1}),
   Result;
fib(_) -> ok.

Haskell

import Data.List (genericIndex)
import Data.Bits
import System.Environment (getArgs)

-- simple Haskell demo to compute an arbitrarily large Fibonacci number.
--
-- this program relies primarily on the principle that F(n+k) = F(n-1)*F(k)+F(n)*F(k+1)
-- from which we find that F(2n) = F(n-1)*F(n) + F(n)*F(n+1) = F(n-1)*F(n) + F(n)*(F(n) + F(n-1))
-- Using the second equation, it is a simple matter to sequentially construct F(2^n) for any value n.
-- With this tool in hand, F(N) may be computed for any N by simply breaking N down into its constituent
-- 2^n components (e.g. N = 14 = 8 + 4 + 2 = 2^3 + 2^2 + 2^1) and to then use these results to guide
-- construction of the value we want by summing each of the F(2^n) using the F(n+k) identities described above

-- given two Fibonacci pairs representing F(n-1),F(n) and F(k-1),F(k), compute F(n+k-1), F(n+k)
fibSum :: (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
fibSum (fnminus1, fn) (fkminus1, fk) =
   (fnminus1*fkminus1 + fn*fk, fnminus1*fk+fn*(fkminus1+fk))

-- given an integer (representing the ordinal of the Fibonacci pair to be retrieved working value and a Fibonacci pair accumulator value,
-- compute return the sum of all Fibonacci values corresponding to the bits in N.  For example, given a value of ten for N in F(N) -
-- bit values: False (2^0 = 1), True (2^1 = 2), False (2^2 = 4), True (2^3 = 8) - compute the sum of F(2 + 8)
fibGenerator :: Integer -> (Integer, Integer) -> (Integer, Integer) -> (Integer, Integer)
fibGenerator (n) tuple accumulator | (n == 0) = accumulator
                                   | ((n .&. 1) == 1) = fibGenerator (shiftR n 1) (fibSum tuple tuple) (fibSum tuple accumulator)
                                   | ((n .&. 1) == 0) = fibGenerator (shiftR n 1) (fibSum tuple tuple) (fibSum tuple accumulator)

-- given a positive integer value N, compute and return Fib(N)
fib :: Integer -> Integer
fib (n) | n0  = snd ( fibGenerator (n - 1) (0, 1) (0, 1))

-- main :: reads our argument and computes then displays corresponding Fibonacci number
main = do
   args

From a computing perspective, there is something deeply non-intuitive about displaying a result that we are still calculating but after mulling it over I think that this may not be quite so strange as it seems. Irrespective of whether this is what Haskell is actually doing, I’ll try tomorrow to address why I think this might actually be possible and that a language like Haskell might be the perfect environment to make it happen.

Leave a comment