Let’s talk about space filling curves. They’re incredibly fun to animate, and they also give a chance to address a certain philosophical question. Math often deals with infinite quantities, sometimes so intimately that the very substance of a result only makes sense in an infinite world. So the question is, how can such results ever be useful in a finite world? As with all philosophizing, this is best left to discuss until after we look at a concrete case. So I’ll begin by laying down an application of something called a Hilbert Curve, followed by a description of its origins in infinite math. Let’s say you wanted to write some software which would enable people to see with their ears. It would take in the data from a camera, and somehow translate it into sound in a meaningful way. The thought here is that brains are plastic enough to build an intuition for sight even when the raw data is scrambled into a different format, as suggested by a few interesting studies and anecdotes. The tricky part is that image data is fundamentally two-dimensional, but sound data, which is a collection of different frequencies all playing at once, is fundamentally one dimensional. To make initial experiments easier, you might start by treating incoming images with a low resolution, say 256-by-256 pixels. To make my own animation efforts easier, let’s represent such an image with a square grid, each cell corresponding with a pixel. One approach to this sound-to-sight software would be to find a nice way to associate each one of those pixels with a unique frequency value. Then when that pixel is brighter, the frequency associated with it would be played louder, and if that pixel were darker, the frequency would be quiet. Listening to all the pixels at once would then sound like a bunch of frequencies overlaid on top of one each other, with dominant frequencies corresponding to brighter regions of the image, sounding like a whole cacophonous mess until your brain learns to make sense out of the information it contains. Let’s temporarily set aside worries about whether or not this method could work, and instead think about what function from pixel space to frequency space gives this software the best chance of success. You could, of course, try doing this with a random mapping. After all, we’re hoping people’s brains make sense out of pretty wonky data anyway. However, it might be nice to leverage some of those intuitions a given human brain already has about sound. For example, if we think in terms of the reverse mapping from frequency space to pixel space, frequencies that are close together should stay close together in the pixel space. This way, even if an ear has a hard time distinguishing between two nearby frequencies, they will each refer to the same basic point in space. To ensure this happens, you could first describe a way to weave a line through each one of your pixels. Then fix each pixel to its spot on the line and unravel the whole thread to make it straight. Interpreting this line as frequency space, we have an association from pixels to frequencies as desired. One weaving method would be to go one row at a time, alternating between going left and right as it moves up the pixel space, like a well-played game of snake. Let’s call this a “Snake Curve”. When you tell your mathematician friend about your idea, he says “Why not use a Hilbert curve‽” When you ask him what that is, he stumbles for a moment. “So, it’s not one curve but an infinite family of curves”, he starts, “Well, no, it is just one thing, but I need to tell you about a certain infinite family first.” He pulls out a piece of paper and starts explaining what he decides to call “Pseudo-Hilbert curves”, for lack of a better term. For an “Order 1 Pseudo-Hilbert curve”, you divide a square into a 2×2 grid, and connect the center of the lower left quadrant, to the center of the upper left, over to the upper right, and end down in the lower right. For an order 2 pseudo-Hilbert curve, rather than just going straight from one quadrant to another, we let our curve do a little work to fill each quadrant while it does so. Specifically, subdivide the square further into a 4×4 grid, and have our curve trace out a miniature order 1 pseudo-Hilbert curve inside each quadrant before it moves on to the next. If we left the mini-curves oriented as they are, going from the end of the mini-curve in the lower left to the start of the mini-curve in the upper left requires a bit of an awkward jump, as does going from the upper right to the lower right, so we flip the curves in the lower left and lower right quadrants to make this connection shorter. Going from an order 2 to an order 3 pseudo-Hilbert curve is similar: Divide the square into an 8×8 grid, put an order 2 pseudo-Hilbert curve in each quadrant, flip the lower left and lower right ones appropriately, then connect them all tip-to-tail. The pattern continues like this for higher orders still. “For the 256-by-256 pixel array,” your mathematician friend explains, “you would use an order 8 Pseudo-Hilbert curve”. And remember, defining a curve which weaves through each pixel is basically the same thing as defining a function from pixel-space to frequency-space, since you are associating each pixel with a point on the line. Now, this is nice as a piece of artwork, but why would these pseudo-Hilbert curves be any better than the snake curves? Here’s one important reason. Imagine that that you go through with the project, integrating your software with real cameras and headphones, and it works. People all over the world are using the device, building intuitions for vision via sound. What happens when you issue an upgrade that increases the resolution of the camera’s image from 256-by-256 to 512-by-512? If you were using the snake curve, as you transition to a higher resolution, many points on the frequency line would have to go to completely different parts of the pixel space. For example, let’s follows a point about halfway along the frequency line. It will end up about halfway up the pixel space, no matter the resolution, but where it is left-to-right can differ wildly as you go from 256 to 512. This means everyone using your software would have to re-learn how to see with their ears, since their original intuitions of which points in space correspond to which frequencies would no longer apply. With the Hilbert curve technique, on the other hand, as you increase the order of a Pseudo-Hilbert curve, a given point on the line moves less and less, just approaching a more specific point in space. That way, you have given your users the opportunity to fine tune their intuitions rather than relearning everything. For this sound-to-sight application, the Hilbert curve approach turns out to be exactly what you want. In fact, given how specific the goal is, it seems almost weirdly perfect, so you go back to your mathematician friend and ask him what the original motivation for defining such a curve was. He explains that near the end of the 19th century, in the aftershock of Cantor’s research on infinity, mathematicians were interested in finding a mapping from a one-dimensional line into two-dimensional space, in such a way that the line runs through every single point in space. To be clear, we’re not talking about a finite, bounded grid of pixels, like we had in the sound-to-sight application. This continuous space, which is very infinite, and the goal is to have a line, which is as thin as thin can be and has zero area, pass through every single one of those infinitely many points that makes up the infinite area of space. Before 1890, many people thought this was clearly impossible. But then Peano defined the first of what would come to be known as space-filling curves. In 1891, Hilbert followed with his own slightly simpler space-filling curve. Technically, each one fills a square, not all of space, but I’ll show later how once you’ve filled a square with a line, filling all of space is not an issue. By the way, mathematicians use the word “curve” to talk about a line running through space, even when it has jagged corners. This is especially counterintuitive terminology in the context of space filling-curves which, in a sense, consist of nothing but sharp corners. A better name would be something like “space-filling fractal”. But hey, it’s math, so we live with bad terminology. None of the pseudo-Hilbert curves that you used to fill pixelated space would count as space-filling curves, no matter how high the order. Just zoom in on one of the pixels. When this pixel is considered part of infinite continuous space, the curve only passes through the tiniest 0-area slice of it, and certainly doesn’t hit every point. But your mathematician friend explains that an actual Hilbert curve is not any one of these, it is instead defined as a limit of all of all them. Defining this limit rigorously is delicate. We first must formalize these “curves” as functions. Specifically, functions which take a single number between 0 and 1 as their input, and output a pair of numbers. This input can be thought of as a point on the line, and the output can be thought of as coordinates in 2d space, but in principle it’s just an association between single numbers and pairs of numbers. For example, an order 2 pseudo-Hilbert curve, as a function, maps the input 0.3 to the output pair (0.125, 0.75). An order 3 pseudo-Hilbert curve maps that same input 0.3 to the output pair (0.0758, 0.6875). The core property which makes a function like this a curve, and not just any ol’ association of single numbers with pairs of numbers, is continuity. The intuition behind continuity is that you don’t want the output of your function to suddenly jump at any point while the input changes smoothly. The way this is made rigorous in math is actually pretty clever, and fully appreciating space-filling curves requires digesting the formal idea of continuity, so it’s definitely worth taking a brief side-step to go over it now. Consider a particular input point, A, and the corresponding output of the function, B. Draw a circle centered at A, look at all the other input points inside that circle, and consider where the function takes all those points in the output space. Now draw the smallest circle you can centered at B that contains those outputs. Different choices for the size of the input circle may result in larger or smaller circles in the output space. Notice what happens when we go through this process at a point where the function jumps, drawing a circle around A, looking at the input points within this circle, seeing where they map, and drawing the smallest possible circle centered at B containing those points. No matter how small the circle around A, the corresponding circle around B cannot be smaller than the jump. For this reasons we say a function is “discontinuous at A” if there is any lower bound on the size of this circle around B. If, on the other hand, the circle around B can be made arbitrarily small, with a sufficiently small choice for a circle around A, we say the function is “continuous at A”. A function as a whole is called continuous if it is continuous at every possible input point. With this more formal description of curves, we are finally ready to define an actual Hilbert curve. Doing so relies on a wonderful property of the sequence of the pseudo-Hilbert curves, which should feel familiar. Take a given input point, like 0.3, and apply each successive Pseudo-Hilbert curve function to this point. The corresponding outputs as we increase the order of the curve approaches some particular point in space. It doesn’t matter matter what input you start with, the sequence of outputs you get by applying each successive Pseudo-Hilbert curve to this point always stabilizes and approaches some point in 2d space. This is absolutely not true for the snake curves, or, for that matter, most sequences of curves filling pixelated space of higher and higher resolutions. The outputs associated with a given input become wildly erratic as the resolution increases, always jumping from left to right and never approaching anything. Because of this property, we can define the Hilbert curve function as follows: For a given input value between 0 and 1, consider the sequence of points in 2d space we get by applying each successive Pseudo-Hilbert curve function to this point. The output of a Hilbert curve function evaluated on that that input is defined to be to the limit of those points. Since the sequence of pseudo-Hilbert curve outputs will always converge, no matter what input we start with, this is a well-defined function in a way that it never could be had we used snake-curves. I won’t go through the proof of why this gives a space filling curve, this video is long enough as it is, but let’s at least see what needs to be proved. First, verify that this is a well defined function by proving that outputs of the Pseudo-Hilbert curve functions really do converge the way I say they do. Second, show that this function gives a curve, meaning it’s continuous. Third, and most importantly, show that it fills all of space, in the sense that every point in the unit square is an output of this function. I encourage anyone watching to take a stab at each one of these. Spoiler alert: all three of these facts turn out to be true. We can extend this to a curve that fills all of space by first tiling space with squares, then chaining a bunch of Hilbert curves together in a spiraling pattern of tiles. You can think of the first tile as coming from the interval from 0 to 1, the second tile as coming from the interval from 1 to 2, and so on, so that the entire positive real number line get mapped into all of 2d space. Take a moment to let this fact sink in. A line, the Platonic form if thinness itself, can wander through infinitely extending and richly dense space and hit every single point. Notice, the core property that made pseudo-Hilbert curves useful in both the sound-to-sight application and their infinite origins is that points on the curve move around less and less as you increase the order of the curve. While translating images to sound, this was useful because upgrading to higher resolutions does not require training your senses all over again. For mathematicians interested in filling continuous space, this property is what ensured that talking about the limit of the sequence of curves was a meaningful thing to do. This connection between the infinite and finite worlds seems to be more the rule than an exception in math. Another example that several astute commenters on my “inventing math” video pointed out is the connection between the divergent sum of all powers of two and representing -1 in a computer with a bit string. It’s not so much that the infinite result is itself directly useful, but instead that the same patterns and constructs used to define and prove an infinite fact have finite analogs, and these finite analogs are directly useful. But the connection is often deeper than mere analogy. Many theorems about infinite objects are equivalent to some theorem regarding a family of finite objects. For example, if during the sound-to-sight project you were to sit down and formalize what it means for your curve to stay stable as you increase camera resolution, you would end up effectively writing the definition for what it means for a sequence of curves to have a limit curve. In fact, a statement about some infinite object, perhaps a sequence, or a fractal, can usually be viewed as a particularly clean way to encapsulate a truth about a family of finite objects. The lesson to take away is that even when a statement seems very removed from reality, you should always be willing to look under the hood at the nuts and bolts of what’s really being said. Who knows, you might find insights for representing numbers from a divergent sum, or for seeing with your ears from filling space. *Thanks for Watching!*