When I finished the post about my exploration of the algorithm to map from a distance to Hilbert coordinates and back, one of the things that still bothered me was that I had not yet implemented some of the optimizations that I could see were possible. For example, the unnecessary creation of additional HilbertCurveSegment class instances and the mapping through a set of basis vectors for what amounted to a simple bit rotation.
So, this afternoon I sat down and created an updated version (which I’ve added to the HilbertCurveExample repository) that implements these ideas. I haven’t bothered to benchmark it against the original but I’m sure it is a considerable improvement if only in terms of its memory footprint.
I had an additional motivation to finish this update. Midway through my work to get to the root of this algorithm, I ran a search (in a moment of frustration) that surfaced a stackoverflow posting whose answer offered what purported to be an actual implementation of the algorithm that I was looking for. Rather than read and implement it (like most normal people might), I simply scanned the code briefly to get a sense of the size and then set aside a link to it. Just seeing that an implementation was available in the event that I truly got stuck gave me enough motivation to press forward.
All the same, I wanted to have a look at this code and the paper associated with it – but not until I’d finished my work. I considered it a challenge to see how close my final solution would be (I still haven’t read it yet but I plan to do so once I finish this post).
The implementation that I completed today discards the HilbertCurveSegment class (which I had implemented in order to help frame the concept that a distance expressed an address related to the vertices of a set of nested N-cubes) and instead computed coordinate vectors, reflections, and basis vector transformations directly. I also discarded the idea of basis vectors since the only key principal that I needed to preserve was the rotation of the basis, which could be implemented more efficiently as a bit rotation. Finally, I attempted to express the concept of bit vector (and its relationship to an ordered sequence of vertices) more forcefully by providing certain variables a suffix of bitVector or vertexRank.
With all of these changes folded in, the final implementation reduces to a little more than 200 lines of code (with comments).
When I was younger, I got my hands on a copy of The Fractal Nature of Geometry by Benoit Mandelbrot. To my young mind, this revealed a fascinating new world of math with algorithms that promised to shape mountains and trees and, best of all, seemed to suggest a hidden landscape of complexity and seeming irregularity that lurked just underneath the simple and familiar mathematics that I knew.
Still etched in my mind, long after reading it, were images and names like the Sierpinski triangle, Peano curves, Hilbert curves, Julia sets, etc. As I grew older, my interests fanned out and the demands of daily life left me less time and energy to spend on these interests. So, while these things certainly still intrigued me, they came to mind less and less.
So I was pleasantly surprised a couple of months ago when my team began discussing the possibility of storing geospatial data in Microsoft’s Cosmos database and I read through the documentation to find that the implementation encoded geospatial coordinates as a Hilbert curve distance so that this distance could be used to create a searchable index. The idea made perfect sense to me but at the time, I had no idea how to map a distance to a Hilbert curve (or vice versa). The concept was interesting but, once again, life stepped in and my focus was driven elsewhere.
In mid-January, though, as I was working through a test case and wanted to generate a set of randomly-distributed coordinates, I started thinking about how a single random number between zero and one might be used to generate each of the coordinate pairs that I wanted if I treated the number as a Hilbert distance. While I went on to address this test in a different way, I went home thinking about mapping a distance to coordinates.
Looking For Direction
So, early on, I visited the Hilbert curve Wikipedia page, which featured an image of a three-dimensional Hilbert curve, an image labelled “production rules”, and C language source code for an algorithm to map between a distance value and a pair of coordinates on a Hilbert curve (and back again). The algorithm would have been very helpful if all I wanted to do was to implement the mapping – I could just port the code and walk away – but, by now, I was eager to pull the lid off of the Hilbert curve and take a look at what “made it work”.
By now, the three dimensional Hilbert curve begged the question, “how do I map between a distance and three coordinates, or five, or more?” Further, I wondered what would a three-dimensional slice of an N-dimensional Hilbert curve even look like? Why did the second order (or second degree as I fell into the habit of calling it) of a two-dimensional Hilbert curve turn head “right” at the onset rather than “up” like its first order ancestor? Yes, the “production rules” image said that this was the case but my question wasn’t about “what” the curve did – I was interested in “why?”
The most valuable clue that I found on the entire page turned out to be the statement that “Hilbert curves in higher dimensions are an instance of a generalization of Gray codes…” (somehow the statement that ” there are several possible generalizations of Hilbert curves to higher dimensions” and the associated footnote link to the paper entitled On Multidimensional Curves With Hilbert Property escaped my notice) It took a bit of exploration on my part before the importance of this statement really sank in.
Happily, for me, none of my searches turned up anything more helpful than this regarding the fundamental algorithm and I was left with this set of questions steadily growing roots in the back of my mind. I say “happily” because there is much greater satisfaction (and understanding) to be gained for an answer that is hard won versus one that is simply handed to me faîtaccomplis.
Finding My Way
So, without further guidance, I began to pick apart the Hilbert curve on my own. It didn’t take long to apply the production rules and demonstrate that, yes, if you follow the rules given, you end up with a nice two-dimensional Hilbert curve of any arbitrary order you want. I hacked together a method in Java that (or maybe I was working from someone else’s code that I had stumbled across) that selected the second order curve to insert based on the last segment that the algorithm had generated and the next segment in the ancestor curve. The algorithm looked essentially like this:
Orientation2D[] expand2d(Orientation2D previous, Orientation2D next) {
Orientation2D[] StepUp = { Right, Up, Left };
Orientation2D[] StepRight = { Up, Right, Down };
Orientation2D[] StepDown = { Left, Down, Right };
Orientation2D[] StepLeft = { Down, Left, Up, };
if (previous == Up && next == Up)
return StepUp;
if (previous == Right && next == Up)
return StepUp;
if (previous == Up && next == Left)
return StepUp;
if (previous == Down && next == Down)
return StepDown;
if (previous == Down && next == Right)
return StepDown;
if (previous == Left && next == Down)
return StepDown;
if (previous == Right && next == Right)
return StepRight;
if (previous == Up && next == Right)
return StepRight;
if (previous == Right && next == Down)
return StepRight;
if (previous == Left && next == Left)
return StepLeft;
if (previous == Down && next == Left)
return StepLeft;
if (previous == Left && next == Up)
return StepLeft;
throw new RuntimeException();
}
It wasn’t pretty but it wasn’t intended to be – all I needed at this point was a generator for an arbitrary curve so that I could look more closely at the results and try to extract a pattern. The interpreter for the resulting output simply incremented X when it encountered a Right instruction, decremented X when it encountered a Left instruction, incremented Y when it encountered an Up instruction, and decremented Y when it encountered a Down instruction.
One of the first things that I noticed about the output was that the first order curve had three steps such that, if it began at (0, 0), it would end at (1,0) and that the net effect of the curve was to advance to the right by one (when looking exclusively at the start point and end point).
First-order Hilbert curve in two dimensions
Meanwhile, the second order curve had fifteen steps such that, if it began at (0, 0), it would end at (3, 0) and the third order curve had sixty-three steps such that, if it began at (0,0), it would end at (7, 0). The clear linkage between the degree, the extents of the curve in the 2d coordinate space, and the length of the curve wasn’t lost on me but I still didn’t quite get what I was looking at.
Second-order Hilbert curve in two dimensionsThird-order Hilbert curve in two dimensions
I took some time to attempt the same decomposition for the construction of a 3d Hilbert curve but it didn’t take long before I realized that this was going to take a lot of effort and that I probably wouldn’t get any closer. Instead, I change gear and looked closely at a second order 3d Hilbert curve so that I could decompose it into a set of Up, Down, Left, Right, Away, and Toward instructions. I then generated a first and second order Hilbert curve and looked at the results.
Interestingly, the resulting set of instructions and extents showed a similar pattern to the 2d curve. The first order curve in two dimensions starting at (0,0) traveled up to (0, 1), then right to (1, 1), then finally traveled down to (0, 1) so that it visited all four vertices of a unit square. The first order curve in three dimensions starting at (0, 0, 0) traveled away to (0, 0, 1), then up to (0, 1, 1), then toward to (0, 1, 0), then right to (1, 1, 0), then away to (1, 1, 1), then down to (1, 0, 1), and finally toward to (1, 0, 0) so that the net movement was, again, one step to the right but only after visiting all eight vertices of a unit cube.
It began to sink in that, at each order, the resulting 2d Hilbert curve visited every vertex of every unit square within whatever area it covered. Similarly, the 3d Hilbert curve visited every vertex of every unit cube of whatever volume it covered. Maybe this doesn’t really seem like headline stuff – after all, the Hilbert curve is categorized as a “space filling curve” so it seems pretty natural that this is exactly what it would do. The thing that changed here, however, was my perspective on the movements required to accomplish this.
Black And White (But Gray All Over)
I finally turned to look again at the first order curve again and I finally noticed what the coordinates were telling me. If I took the first order 2d curve and treated the coordinates as a binary number, the sequence was simply:
00
01
11
10
Meanwhile, if I took the first order 3d curve and treated its coordinates as a binary number, the sequence became:
000
001
011
010
110
111
101
100
If you’re at all familiar with Gray code, you will probably recognize this sequence (and if you’re not, the Gray code Wikipedia page can tell you all sorts of things about it). This observation is central to the mechanism behind the generation of a Hilbert curve because each transition from one vertex to another in a Hilbert curve in any dimension travels along an edge – never along a diagonal. Saying this another way: along a Hilbert curve, moving from one vertex to the next always involves changing the value of only one coordinate (e.g. X or Y may change but never X and Y). This is exactly what a Gray code achieves.
Gray codes have been used in software and hardware because they possess this property: the transition from any Gray code value to the next Gray code value involves a change to the value of a exactly one bit. This is valuable in communication and positioning because the hardware or software can easily test for a change to two or more bits between adjacent values and report an error (e.g. to report an error and halt the movement of a robotic arm because the positioning sensor has become dirty and, therefore, unreliable).
Another interesting property of a Gray code is that any permutation of a Gray code is also a Gray code. That is, if we label the bits of our Gray code A, B, and C, the sequence in ABC (000, 001, 011, 010, 110, 111, 101, 100) can be remapped to BCA as (000, 010, 110, 100, 101, 111, 011, 001) but this sequence remains a Gray code. Our fist sequence in ABC corresponds to a Hilbert curve that originates at (0, 0, 0) and terminates at (1, 0, 0) while our second corresponds to a Hilbert curve that originates at (0, 0, 0) and terminates at (0, 0, 1). In effect, we can act as though we have a set of basis vectors that A, B, and C to X, Y, and Z and we can use this to orient our first order Hilbert curve so that it progresses up, right, or away.
So the Wikipedia article on the Hilbert curve really sort of buries the headline with the assertion that “Hilbert curves in higher dimensions are an instance of a generalization of Gray codes…” because any first order Hilbert curve is nothing more than a Gray code.
Turning To Progress In A New Direction
So, this is an important milestone – we can use this to construct a first order Hilbert curve in any number of dimensions. Moreover, we can find the set of coordinates for any vertex on this first order Hilbert from a simple distance value by converting that distance value to its corresponding Gray code value. In many languages (e.g. C, C++, Java, C#, etc.) we accomplish this easily with the statement:
long grayCode = value ^ (value >> 1);
Thinking about the process of generating a Hilbert curve, then, it seems that we can generate our second order Hilbert curve by simply replacing a segment along its path with the set of segments whose beginning and end map to that segment’s beginning and end. To illustrate this idea, we can look at the 2d Hilbert curve
The first order Hilbert curve is generated from a two bit Gray code:
This curve moves from the left to the right but it does so by first moving up from (0, 0) to (0, 1). We can find the appropriate variation of the first order 2d curve by simply exchanging the order of our A and B bits (or rotating the bits in AB):
Another way to think about this (though not a particularly efficient way to calculate it) is to imagine that we have our simple set of bits that encode our Gray code values (e.g. AB) and that we have a matrix that maps these AB bits onto our X, Y coordinate values. The matrix:
gives us A, B as X, Y while the matrix:
gives us B, A as X, Y.
So, we can construct our progress from (0, 0) in the second order 2d Hilbert curve by changing the mapping of (A, B) to (X, Y) so that X=B and Y=A in the (0, 0) quadrant. Then observation shows that the next two sections of the second order Hilbert curve are simply repeats of the first order curve but at different starting points so, here the mapping is once again X=A and Y=B in the (0, 2) and (2, 2) quadrants.
Reflecting On The Results
Still, we’re not quite there yet because when we reach the final leg of the first order Hilbert curve ancestor, and descend from (1, 1) to (1, 0), we have two problems.
Our first problem is that this section of the curve starts at (3, 1) and ends at (3, 0) so it would require a version of our first order Hilbert curve that starts at (1, 1) and ends at (1, 0). But there is simply no permutation of our A and B bits that will start at 11 and end at 10.
Our second problem is that, in this section of the curve, our X coordinate value must decrease during its movement from (1, 1) to (1, 0) in order to reach two of the four remaining unit square vertices at (2, 1) and (2, 0). Once again, there is simply no permutation of our A and B bits that will accomplish this for us. So, what is going on here?
Well, if you’re geometrically-minded, you’ve probably intuited that there is a left-right mirror symmetry to the second order Hilbert curve in 2d (in fact, it turns out that symmetry is an essential feature of Hilbert curves of all dimensions and of all orders). Once we reach the midpoint along our Hilbert curve, we’re actually travelling backwards over a reflection of the original curve.
A particularly convoluted (but very precise) way of looking at this is to say that, starting at the midpoint of our Hilbert curve, we begin to compute points based on a decreasing distance along a left-right mirror image of the first half of our Hilbert curve. Here is the convoluted (but precise) part:
If we were simply travelling backwards, we would be retracing the same set of coordinates. Since we are, instead, traversing a left-right mirror image of the first half of the Hilbert curve, we move right when would would move left. As a result, we continue to move right until we reach X=3 and then we progress down. Now, from (3, 1) to (3, 0), we traverse the mirror image of our first order Hilbert curve from the first quadrant by moving from (1, 1), to (0, 1), to (0, 0), and finally to (1, 0).
Connecting The Dots
So, at this point, we have all of the elements required to construct an algorithm to compute a point along an arbitrary dimension, arbitrary order Hilbert curve and the trick is just putting them together. Moreover, we actually have everything we need to convert from a set of coordinates back to a distance value. The only thing that remains is to put them together to give us the result that we want.
As I’m sure you’d have already guessed by now, I’ve already created an implementation of the code to do this and I’ve posted it in my Github repository as HilbertCurveExample. It feels like there are probably a few loose ends to tie up so I may return with a follow-up post.
Wikipedia
In case you can’t tell, I rely heavily on Wikipedia as a fantastic (and, I believe, seriously undervalued) resource for a lot of my resource. I make it a point to donate and I would like to encourage anyone reading this to donate to Wikipedia too
Yesterday I suggested that the reason my old Haskell-based Fibonacci calculator program was so fast might be because Haskell was displaying the result before it was fully calculated. Now, the typical software development experience involves working with finite-sized integers (e.g. 16-bit, 32-bit, 64-bit, etc.) and floating point numbers. Almost without exception, the operations that a program performs against these types of values today are implemented completely on the processor so that they are essentially atomic – one moment all you have is two operands and the next you have the result. There is no in-between period where you have “part” of the result. As a result the idea that you might have one part of a numeric result and not have another may be somewhat counter-intuitive. But…
Most of the math that we perform, when we perform it by hand, is algorithmic. What I mean here is that we don’t simply multiply two large integers – we follow an algorithm that instructs us to multiply each of the digits and then add the results in a particular way. For example, suppose we multiply 521 by 482 to get 251122. Elementary school multiplication has us perform the following steps:
1) designate one number the multiplicand and the other the multiplier (I’ll choose 521 as the multiplicand and 482 as the multiplier)
2) choose the lowest order digit of the multiplier and note its magnitude (e.g. the power of ten corresponding to its position)
2) multiply each of the digits of the multiplicand by the power of ten corresponding to its place and by the multiplier digit you just selected, starting with the least significant digit of the multiplicand and working our way up, summing the results as we go and finally multiply the sum by the magnitude of the multiplier digit.
3) repeat step two for each of the remaining digits of the multiplier and their corresponding magnitudes.
4) finally, we sum these values to obtain our result:
The math that the processor performs on our behalf for finite integers uses the same algorithm to produce its result (only the computations are in base 2 instead of base 10). What I mean here is that while the processor does all of the work for us, even at the silicone level, the result is actually computed in stages. These stages might simply be cascades of gates but there is still a series of “steps” or “stages” involved in generating the number.
When a computer program performs arbitrary-precision math multiplication operations (aka BigInteger math), it uses the same algorithm, or one of a few more efficient alternatives, to compute the result. But what if we could get part of our result early? Perhaps we could take advantage of a few extra processor cores and start working with it.
Well, it turns out that we can, in most cases, get part of our result much earlier. All we have to do is change our order of operations. This typical grade-school version of the multiplication algorithm is devised for simplicity. While the way I’ve written it makes it look a bit more complex, the order of operations is roughly the same and I hope it makes it easier to understand the reordering. So let’s look at a different way of computing this result.
First we would like to get an idea of the magnitude of our result. Already we know that it has to be expressible in six digits since we’re multiplying two three digit numbers but we might like to know the most significant digit because that will get us a lot closer. That value is simply the highest order digit of the product of the two highest order digits plus any “carry” digits. In our case, 4*5=20 and, even after we factor in the carry from subsequent operations that we must add to our result, it is pretty easy to demonstrate that, since the least significant digit of 20 is zero, the addition of carry digits could never result in a change to our most significant digit: 2.
For the next digit, though, we’ll need the sum of any carry digits from lower order operations. Our next digit is the least significant digit from our first calculation (it’s worth pointing out that our previous calculation may have had only one digit, in which case we’d have skipped to this step) plus the most significant digit of the sum of our next lower order calculations plus any carry from lower order operations.
Since our second digit is non-zero, we need to consider carry so we have to perform the next operation:
Now we know that we don’t need to worry about carry from the next operation so we can finish computing the previous result:
Now there is still a chance that our next operation will yield a result that will affect our least significant digit of 1 but we can now reliably say that our next most significant digit will be 5. So now we’ll compute the next value.
Now there is no chance that adding this plus any carry could affect our 1000s digit so we can safely say that our next most significant digit is the 1 from our calculation above. So now our result so far is: 251000. All we have to do is finish our final calculation, which is simply:
Adding this to the result above, we get 122, which we can add to our result in progress to get 251122.
Now, yes, I’ve been fast and loose with the operations and I haven’t really provided a formal algorithm but I’m sure you can see the general framework. The more important point here is that I was able to work out the most significant digits early in the operation. While this isn’t particularly helpful when dealing with calculations involving six digit results, it can be crucial when working with arbitrary precision results involving hundreds of millions of digits. In these cases, knowing the most significant digits can enable you to start reporting the result before you’ve finished calculating it or simply to determine that it is outside the bounds of the problem set, etc.
There are two keys to taking advantage of this. The first is structuring calculations in a manner like the one above so that higher order computations required to determine the most significant “digit”(s) occur first. The second is structuring this evaluation so that these results are available to “callers” as soon as they are known. In a traditional language, this might involve some relatively exotic programming so this is where we go back to Haskell and functional languages in general.
Haskell performs computations using “non-strict” evaluation. This basically means that if you never request a result, it is never computed. Equally as importantly, when you do request a value, all of the calculations specific to computing it are performed at that moment. In a very real way, it’s as though when a Haskell program ran, it simply generated a set of instructions for computing the result needed and returned these to the caller. When the caller decided to report the result, he would read the recipe and performed the operations it contained to compute the result.
Since BigInteger math consists of working with numbers that are simply strings of digits in, say, base 4294967296, the same basic algorithm described above could be applied to yield the most significant “digits” early. These results could go directly to the next calculation or to a base 10 conversion so that they can be displayed in human-readable format. If Haskell’s base 10 conversion were built to work with higher order digits first then we might very well see the result that I first described – that the result was still being calculated as it was displayed.
In practice, Haskell may not perform this level of “pipelining” for arbitrary precision math operations and it is quite possible that BigInteger results must be computed fully before they become available for subsequent operations. My point here isn’t to say what Haskell does as much as it is to point out what is possible – particularly as it relates to taking advantage of more processor cores. It is possible that Haskell’s performance has somewhat more mundane roots. To this point, next time, I’ll take a look at the algorithm that I think is currently used by Haskell for base 10 conversion and how it fits in to this picture.