I’ve been sitting on this post not finishing it for almost a year. At this point it’s as finished as it will ever be, so I’m putting what I’ve got so far out there.


This post is about category theory. If you don’t yet see the “fun” in “functor”, it will probably be difficult to follow. If you want to try to follow along anyway, look up what categories/product objects/exponential objects/functors/natural transformations/adjunctions are, try to read this post, fail to get very far, find three or four more introductions to category theory, read all of them to gain as much intuition as possible from their slightly different perspectives, try again, fail again, become a monk at one of those isolated-in-the-mountains math temples, vow not to speak until you truly understand the Yoneda Lemma, realize this all escalated rather quickly, decide you’ve had enough, move to a small island in the Carribean, conclude that maybe the beach is more fun than math, have an epiphany while brushing teeth revealing a tiny aspect of what adjoint functors might be all about, and write a blog post about it. That’s more or less how I got here.

Only slightly more seriously: you can probably get something out of this post only if you know what categories and functors are, and familiarity with their basic concepts and notation shall be mercilessly assumed. Before continuing, if you do not yet know what a natural transformation is, you should at least attempt to read a precise definition of the term elsewhere, because you won’t find one here. We’ll start with an attempt to grasp that formal definition intuitively, and we’ll continue that trend throughout.

Definitions

If \(F\) is a functor and \(X\) is an object, then \(FX\) is in some sense an object with “outer structure” of type \(F\) and “inner information” of type \(X\). A functor “maps” a morphism by using the morphism only on the level of inner information while leaving the outer structure intact. For example, consider the list functor, \(L : \mathbf{Set} \to \mathbf{Set}\). On objects, it brings each set \(X\) to the set of lists of elements of \(X\). On morphisms, it maps each function \(f\) to “mapped \(f\)”, which, given a list as input, outputs the list of results of applying \(f\) to each member of the list. For instance, if \(f(x) = x + 3\), then \(Lf([1,2,3]) = [4,5,6]\). Mapped functions deal only with inner information, applying a function to individual elements of a list, but they don’t modify outer structure by adding, removing, or rearranging elements of the list.

Natural transformations are the opposite: they are morphisms that act on outer structure only, leaving the inner information intact. A natural transformation \(\alpha : F \to G\) maps outer structure \(F\) to outer structure \(G\). Of course, \(F\) and \(G\) aren’t objects (of the categories we care about right now), so \(\alpha\) is represented as a collection of components, with an \(\alpha_X : FX \to GX\) for each object \(X\) in the domain of \(F\), but in a certain sense all its components do the same thing. As an example, for any \(X\), we can define a list reversal function \(\rho_X : LX \to LX\). But this is kind of tedious: to reverse a list, we don’t care which set its members are from. We just change their order. We just change the outer structure. List reversal is “polymorphic” in that any choice can be made for what’s inside the list being reversed. That is, list reversal is a natural transformation \(\rho : L \to L\).

The notion of operating on outer structure only is made precise by the naturality condition. Given a morphism \(f : X \to Y\), which acts only on inner information, and a natural transformation \(\alpha : F \to G\), which acts only on outer structure, there are ways we can imagine building a morphism that transforms both, \(FX \to GY\): either use a map of \(f\) on inner information followed by \(\alpha\) on outer structure, or vice versa. If \(\alpha\) really does ignore inner information and maps of \(f\) really do ignore outer structure, these two choices should be the same. The naturality condition captures this in an equation: \(Gf \circ \alpha_X = \alpha_Y \circ Ff\).

An adjunction of two functors \(F : \mathcal{C} \to \mathcal{D}\) and \(G : \mathcal{D} \to \mathcal{C}\) is a pair of natural transformations:

  • the unit, \(\eta : 1_{\mathcal{D}} \to GF\), and
  • the counit, \(\varepsilon : FG \to 1_{\mathcal{C}}\);

satisfying a pair of natural transformation composition laws called the triangle identities (because when drawn as commuting diagrams, each equation is a triangle): for all objects \(X\) in \(\mathcal{C}\) and \(Y\) in \(\mathcal{D}\),

  • \(F\eta_X \circ \varepsilon_{FX} = 1_{FX}\), and
  • \(\eta_{GY} \circ G\varepsilon_Y = 1_{GY}\).

If you didn’t know what an adjunction was already, well, now you… probably still don’t. But don’t panic! If you followed most of the discussion of natural transformations, you’re all set to keep reading. The internet is full of many detailed explanations of the definition of adjunctions written by people who know it better than I do. My personal favorite is a series of videos by The Catsters, but, as mentioned in the introduction, seeing many explanations and intuitive perspectives helped me a lot. Instead of giving additional definitional detail, this post introduces another such intuitive perspective: some adjunctions can be thought of as describing evaluation of computation.

Free and Forgetful Functors

Some classic adjoint functor pair examples are “free” and “forgetful” functors for various algebraic structures over sets, such as groups, rings, and monoids. For concreteness, we consider monoids, which are quickly defined and explained below.

A monoid is a set with an associative binary operation and an element that’s the left and right identity of that operation. Monoids are like groups in which inverses might not exist. Indeed, all groups are also monoids. One monoid that isn’t a group is the set of all \(n \times n\) matrices: multiplication is an associative operation with an identity, but not all matrices are invertible. A monoid homomorphism is, analogous a group or ring homomorphism, a function that preserves the operation and its identity. In equations, using \(\bullet_A\) and \(1_A\) to denote the operation and identity element of a monoid \(A\), we say \(f : A \to B\) is a monoid homomorphism if \(f(x \, \bullet_A \, y) = f(x) \, \bullet_B \, f(y)\) and \(f(1_A) = 1_B\). There is a category of all monoids, which we creatively call \(\mathbf{Mon}\), with all monoids as objects and all monoid homomorphisms as morphisms. As with groups, by default we call the operation “multiplication”, and we write it as juxtaposition, often without parentheses, which associativity makes unnecessary.

We have two functors to define: the free functor, \(F : \mathbf{Set} \to \mathbf{Mon}\), and the forgetful functor, \(G : \mathbf{Mon} \to \mathbf{Set}\). The forgetful functor is easy to describe. On objects, it maps each monoid to its underlying set of elements, “forgetting” what the operation does and which element is the identity. On morphisms, it maps each monoid homomorphism to its underlying function between two sets, “forgetting” that the function happened to satisfy any equations.

The free functor is slightly trickier to describe. The free monoid on a set \(X\), written \(FX\) (hint, hint), is a monoid “freely generated” by the elements of \(X\). This means two things.

  • By “generated”, we mean that the underlying set of \(FX\) has all the elements of \(X\) plus anything else needed to be a monoid. For example, if we were to generate a monoid from \(\{17,42\}\) using \(+\) as the operation, our generated monoid would need \(0\), because it’s the identity, \(59\), because it’s \(17+42\), and many more numbers.
  • By “freely”, we mean that the operation of \(FX\) never assumes two things are equal if they don’t have to be. For example, if \(X = \{x,y,z\}\), then \(x(yz) = (xy)z\) is required by associativity, but \(xy \neq yx\) because no monoid axiom says they have to be equal.

It turns out that \(FX\) has a concise interpretation: the free monoid on \(X\) is lists of elements of \(X\), with concatenation of lists as the operation. For instance, concatenating the lists \([x,y,y]\) and \([x,z,x]\) gives

\[[x,y,y][x,z,x] = [x,y,y,x,z,x].\]

The identity of \(FX\) is the empty list, \([]\). We sometimes call the free monoid the “list monoid”.

As suggested by our notation, the free functor \(F : \mathbf{Set} \to \mathbf{Mon}\) maps each set to the free monoid on it. To finish the definition, we need to define how to turn a function \(f : X \to Y\) into a monoid homomorphism \(Ff : FX \to FY\). Ignoring the monoid homomorphism conditions, our task is this: given a function \(f : X \to Y\) and a list of elements of \(X\), generate a list of elements of \(Y\). Recalling our discussion of the list functor \(L\), we take \(Ff\) to be list-mapped \(f\). It’s not hard to check that this satisfies the axioms for both functors and monoid homomorphisms.

A subtle distinction bears mentioning: though we call both “list-mapped \(f\)”, \(Ff\) and \(Lf\) are not the same thing. They’re not even the same type of thing! The former is a monoid homomorphism, and the latter is a function. That said, they are related: \(Lf\) is the underlying function of \(Ff\). (In fact, \(GF = L\). More on this in a bit.)

Similar notions exist for groups and rings. We’ll focus on monoids in the next section but will mention rings as well, for which we’ll need the following result (with proof left as an exercise, of course): the free (commutative) ring on a set \(X\) is the polynomial ring with integer coefficients where each element of \(X\) is a variable.

The Free-Forgetful Counit is Expression Evaluation

Let’s summarize the story so far.

  • The free functor, \(F : \mathbf{Set} \to \mathbf{Mon}\), maps each set to its list monoid and each function to its list-mapped version.
  • The forgetful functor, \(G : \mathbf{Mon} \to \mathbf{Set}\), maps each monoid to its underlying set and each monoid homomorphism to its underlying function.

As mentioned at the beginning of the previous section, these are adjoint functors, which means there are natural transformations \(\eta : 1_{\mathbf{Set}} \to GF\) and \(\varepsilon : FG \to 1_{\mathbf{Mon}}\) satisfying the triangle identities. Before trying to figure out what \(\eta\) and \(\varepsilon\) are, let’s first understand what the relevant functor compositions are.

  • \(GF : \mathbf{Set} \to \mathbf{Set}\) brings a set \(X\) to the underlying set of the list monoid on \(X\), which is the set of lists of elements of \(X\). We’ve actually seen \(GF\) before: it’s the list functor \(L\) from the discussion of natural transformations.
  • \(FG : \mathbf{Mon} \to \mathbf{Mon}\) brings a monoid \(Y\) to the list monoid on the underlying set of \(Y\). I like to think of this as the monoid of “unevaluated expressions” in \(Y\) by thinking of a list of elements of \(Y\) as a list of terms to be multiplied. Multiplying unevaluated expressions corresponds to list concatenation. For example, we can multiply \(17 \times 42\) and \(38 \times 99\) without simplifying to get \(17 \times 42 \times 38 \times 99\).

This, along with the intuition of natural transformations as “polymorphic” morphisms, is enough to guess what the unit and counit are.

Let’s start with the unit, \(\eta : 1_{\mathbf{Set}} \to GF\). Given a set \(X\), a component \(\eta_X : X \to GFX\) is a function from \(X\) to lists of elements of \(X\), which we called \(LX\) earlier on and call \(GFX\) now. That is, \(\eta_X\) gets a single element of \(X\) as input and has to produce a list of elements as output. A simple way to do this is to produce a singleton list, so we define

\[\eta_X(x) = [x].\]

It’s straightforward to mechanically verify that \(\eta\) is a natural transformation. It certainly fits our polymorphism intuition. Each component \(\eta_X\) wraps a list “outer structure” around its argument in the exact same way, without regard for the “inner information” about what the argument is or what set it’s from.

We turn to the counit, \(\varepsilon : FG \to 1_{\mathbf{Mon}}\). Given a monoid \(Y\), a component \(\varepsilon_Y : FGY \to Y\) is a monoid homomorphism from the list monoid on the underlying set of \(Y\) to \(Y\) itself. That is, \(\varepsilon_Y\) gets a list of elements of \(Y\) as input and has to produce a single element as output. A simple way to do this is to multiply everything in the list together to produce a single result (with the empty list mapping to the identity of \(Y\)), so we define

\[\varepsilon_Y([y_1, y_2, \ldots, y_n]) = y_1 y_2 \cdots y_n.\]

That is, if we think of a list of elements of \(Y\) as an unevaluated monoid expression, then \(\varepsilon_Y\) evaluates the expression.

It’s straightforward to mechanically verify that \(\varepsilon\) is a natural transformation. That said, it doesn’t clearly fit our polymorphism intuition because we use multiplication, which feels like using “inner information”. However, as we’re about to see, this feeling is wrong!

In \(\mathbf{Set}\), given multiple arbitrary elements of an arbitrary set, there’s no way for the multiple elements to interact. Natural transformations can move elements around, as we saw with our earlier example of list reversal, but there’s no way to use the given elements to get a new element of the set. If this seems restrictive, it’s because it is. The morphisms in \(\mathbf{Set}\) are arbitrary functions, so the inner information that morphisms can modify is basically as unrestricted as possible. This flexibility when modifying inner information is what puts such strong restrictions on modifying outer structure. Given functors \(F, G : \mathbf{Set} \to \mathbf{Set}\), if a family of functions \(\alpha_X : FX \to GX\) does anything too fancy, we can find some function \(f : X \to Y\) such that \(Gf \circ \alpha_X \neq \alpha_Y \circ Ff\) because there are so many things arbitrary functions can do.

The story is different for \(\mathbf{Mon}\) because its morphisms are more restricted than those in \(\mathbf{Set}\): they preserve multiplication and identity elements. Furthermore, given multiple arbitrary elements of an arbitrary monoid, there are two ways we can get new elements that weren’t initially given: multiplication and getting the identity. Together, these facts mean both multiplication and using the identity are fair game when modifying outer structure. This is intuitively why \(\varepsilon\) is a natural transformation: it uses only identity (when given the empty list) and multiplication (when given a list with more than one element), both of which are outer structure for the purposes of monoid homomorphisms.

Showing that \(\eta\) and \(\varepsilon\) actually satisfy the triangle identities is an unsurprising exercise that can be left for another day.

More than Monoids

Hmmm, this post is pretty long already. Here are the two further examples I was going to talk about before my arms fell off.

  • The free-forgetful counit for rings is polynomial evaluation. The reasoning is pretty similar to what we’ve seen for monoids, except we use \(\mathbf{Ring}\) instead of \(\mathbf{Mon}\). This means the outer structure that natural transformations can use includes addition, subtraction, and additive identity as well as the multiplication and multiplicative identity we had for monoids.
  • As an example that does not involve a free-forgetful adjunction: the product-exponential counit is function evaluation.

Hopefully this helped in understanding what natural transformations, or maybe even adjunctions, are at a level that’s intuitive but not just hand-waving. If you’re curious for more material on relating adjunctions to computation, I’m pretty sure “something something free algebra” is a relevant next step.