Ziv ScullyPhD student, Computer Science Department, Carnegie Mellon University
http://ziv.codes/
Thu, 31 Oct 2019 18:53:59 +0000Thu, 31 Oct 2019 18:53:59 +0000Jekyll v3.8.5The Information Theory of Brooklyn 99<p>During my internship at IBM Research this summer, my office mate <a href="https://web.mit.edu/renboz/www/">Renbo</a> and I discussed the following extremely important research problem.</p>
<blockquote>
<iframe width="560" height="315" src="https://www.youtube.com/embed/Cs-TGLxQfBM" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<p>There are twelve men on an island. Eleven weigh exactly the same amount, but one of them is slightly lighter or heavier. You must figure out which. The island has no scales, but there is a see-saw. The exciting catch: you can only use it three times.</p>
</blockquote>
<p>To clarify: the riddle is asking us to find the odd-weight islander, but we do not need to determine whether the odd-weight islander is lighter or heavier.</p>
<p>Instead of solving this riddle, we’re going to take the riddle-writer’s perspective: <em>what is the maximum number of islanders such that the riddle is solvable?</em> Along the way, we’ll uncover some hints about to approach the original riddle.</p>
<h2 id="the-problem">The Problem</h2>
<p>Suppose we have <script type="math/tex">n</script> islanders and are allowed <script type="math/tex">w</script> weighings on the see-saw. We ask: under what conditions on <script type="math/tex">n</script> and <script type="math/tex">w</script> is the riddle solvable? (In the original riddle, <script type="math/tex">n = 12</script> and <script type="math/tex">w = 3</script>.) There are two sorts of conditions we might look for.</p>
<ul>
<li>A <em>sufficient</em> condition on <script type="math/tex">n</script> and <script type="math/tex">w</script> is one such that if the condition holds, the riddle is solvable.</li>
<li>A <em>necessary</em> condition on <script type="math/tex">n</script> and <script type="math/tex">w</script> is one such that if the condition fails, the riddle is unsolvable.</li>
</ul>
<p>The way to prove that a condition is sufficient is pretty intuitive: given the condition, find a solution! For example, solving the original riddle shows that <script type="math/tex">n = 12, w = 3</script> is a sufficient condition. Of course, as those who paused the Brooklyn 99 episode to try out the riddle quickly found out, finding the solution can still be tricky. But the overall approach to proving a sufficient condition is clear.</p>
<p>Necessary conditions are more difficult. To prove that a condition is necessary, we have to show that if the condition fails, then <em>there does not exist</em> any islander-weighing strategy that solves the riddle. For example, suppose we want to prove that the puzzle is impossible for <script type="math/tex">n = 13</script> islanders in <script type="math/tex">w = 3</script> weighings. It’s not enough to take a solution that finds the odd-weight islander for <script type="math/tex">n = 12, w = 3</script> and demonstrate that it fails for <script type="math/tex">n = 13, w = 3</script>. Instead, we need to somehow consider <em>every</em> islander-weighing strategy.</p>
<p>In this post, we will find a necessary condition on <script type="math/tex">n</script> and <script type="math/tex">w</script> for the puzzle to be solvable. Specifically, we’ll prove that the puzzle is solvable only if</p>
<script type="math/tex; mode=display">% <![CDATA[
2n - 1 < 3^w. %]]></script>
<p>To do so, we’ll use <em>information theory</em>, a branch of mathematics that helps us prove necessary conditions like this without explicitly considering every possible islander-weighing strategy.</p>
<h2 id="the-short-version">The Short Version</h2>
<p>We’re going to start by proving a slightly weaker necessary condition: the puzzle is solvable only if <script type="math/tex">2n - 1 \leq 3^w</script> (note the non-strict inequality). Proving this weaker condition introduces all the key ideas <em>without</em> using any information theory.</p>
<p>We begin by observing that if the riddle is solvable, then it is solvable with a deterministic strategy (see Appendix, <span id="lemma-1-link"><a href="#lemma-1">Lemma 1</a></span>), so we can restrict our attention to deterministic strategies.</p>
<p>Suppose the riddle asked us to both find the odd-weight islander and determine whether they are lighter or heavier. To solve this modified riddle, we need to distinguish between the <script type="math/tex">2n</script> possible scenarios: there are <script type="math/tex">n</script> possibilities for the odd-weight islander, and they are either lighter or heavier. Each see-saw weighing has three possible outcomes: the left side is heavier, the right side is heavier, or the sides are balanced. So there are <script type="math/tex">3^w</script> possible outcomes of a sequence of <script type="math/tex">w</script> weighings. After observing one of these <script type="math/tex">3^w</script> outcomes, we should be able to unambiguously determine which of the <script type="math/tex">2n</script> scenarios we are in. We can’t do this if there are fewer outcomes than scenarios. This means <script type="math/tex">2n \leq 3^w</script> is a necessary condition to solve the modified riddle.</p>
<p>The original riddle asks us just to find the odd-weight islander, so it seems like we only have to distinguish between <script type="math/tex">n</script> possible scenarios. But it turns out that, even if we’re not trying to, we almost always end up figuring out whether the odd-weight islander is lighter or heavier. To see why, suppose we figure out that Charles is the odd-weight islander. If we ever put Charles on the see-saw in the process, then we see whether Charles was on the lighter or heavier side of the see-saw. This means the only way we can find the odd-weight islander <em>without</em> finding out whether they are lighter or heavier is if we never weigh the odd-weight islander.</p>
<p>Given an islander-weighing strategy, we say an islander is “lonely” if whenever they are the odd-weight islander, our strategy never weighs them. It turns out that any successful strategy can have at most one lonely islander (see Appendix, <span id="lemma-2-link"><a href="#lemma-2">Lemma 2</a></span>). This means that to solve the riddle, we must distinguish between at least <script type="math/tex">2n - 1</script> possible scenarios:</p>
<ul>
<li>a single scenario in which the odd-weight islander is the lonely islander, and</li>
<li><script type="math/tex">2n - 2</script> scenarios in which the odd-weight islander is one of the <script type="math/tex">n - 1</script> non-lonely islanders.</li>
</ul>
<p>There are still <script type="math/tex">3^w</script> possible outcomes of <script type="math/tex">w</script> weighings, so <script type="math/tex">2n - 1 \leq 3^w</script> is a necessary condition to solve the riddle.</p>
<h2 id="a-stricter-condition">A Stricter Condition</h2>
<p>Let’s think about what the <script type="math/tex">2n - 1 \leq 3^w</script> condition says about the original <script type="math/tex">n = 12, w = 3</script> riddle. Could we solve it with fewer weighings? How about more islanders?</p>
<ul>
<li>Given <script type="math/tex">n = 12</script>, we must have <script type="math/tex">23 \leq 3^w</script>, which means we need <script type="math/tex">w \geq 3</script>, so the riddle <em>cannot</em> be solved in fewer weighings.</li>
<li>Given <script type="math/tex">w = 3</script>, we must have <script type="math/tex">2n - 1 \leq 27</script>, which means <script type="math/tex">n \leq 14</script>. So the riddle <em>might</em> be solvable with more islanders.</li>
</ul>
<p>It turns out that the riddle is solvable for <script type="math/tex">n = 13</script> but not <script type="math/tex">n = 14</script>. But we could rule out <script type="math/tex">n = 14</script> using the necessary condition <script type="math/tex">% <![CDATA[
2n - 1 < 3^w %]]></script> (note the strict inequality) that we promised in the introduction. Proving this stricter condition is where the information theory comes in.</p>
<h2 id="how-much-information-is-in-each-weighing">How Much Information is in Each Weighing?</h2>
<p>One of the key insights of information theory is to draw a connection between information and randomness. Information theory views anything we don’t know as a random variable. The <em>entropy</em> of a random variable tells us, roughly speaking, how much information we expect to gain by learning the outcome of that random variable. For brevity, I’m not going to explain in detail how to define entropy, instead explaining just the bits we need for this post. (If you’re curious, sources like <a href="https://en.wikipedia.org/wiki/Entropy_(information_theory)">Wikipedia</a> have pretty good explanations.)</p>
<p>In our context, the main random variable we want to know is which of the <script type="math/tex">2n - 1</script> scenarios we are in. Let’s give it a name:</p>
<script type="math/tex; mode=display">S = \text{``the scenario we're in''.}</script>
<p>If each of the scenarios is equally likely, then <script type="math/tex">S</script> has entropy <script type="math/tex">H(S) = \log_2(2n - 1)</script>.</p>
<p>Solving the riddle entails learning the outcome of <script type="math/tex">S</script>, but we never observe <script type="math/tex">S</script> directly. Instead, we observe the results of each weighing. These weighing outcomes are also random variables, so let’s write</p>
<script type="math/tex; mode=display">T_i = \text{``the outcome of the $i$th weighing''.}</script>
<p>Let’s call the possible outcomes of a weighing <script type="math/tex">\ell</script>, <script type="math/tex">r</script>, and <script type="math/tex">b</script> for tipping left, tipping right, and staying balanced, respectively. Writing <script type="math/tex">p_i</script> for the probability mass function of <script type="math/tex">T_i</script> (meaning, for example, that <script type="math/tex">p_3(l)</script> is the probability that the see-saw tips left on the third weighing), we can write the entropy of <script type="math/tex">T_i</script> as</p>
<script type="math/tex; mode=display">H(T_i)
= p_i(\ell) \log_2\biggl(\frac{1}{p_i(\ell)}\biggr)
+ p_i(r) \log_2\biggl(\frac{1}{p_i(r)}\biggr)
+ p_i(b) \log_2\biggl(\frac{1}{p_i(b)}\biggr).</script>
<p>To solve the riddle, it must be that the amount of information we get from the weighings is at least the amount of information we would get by directly learning which scenario we are in. That is, a necessary condition for the riddle to be solvable is</p>
<script type="math/tex; mode=display">H(S) \leq \sum_{i = 0}^w H(T_i). \qquad (*)</script>
<p>(To be precise, the amount of information we learn from the <script type="math/tex">i</script>th weighing is never more than <script type="math/tex">H(T_i)</script>, but might be less if later weighings tell us information we already learned from earlier weighings. Look up <a href="https://en.wikipedia.org/wiki/Conditional_entropy">conditional entropy</a> for details.)</p>
<p>Our question thus becomes: how do we maximize <script type="math/tex">H(T_i)</script>, the entropy of the <script type="math/tex">i</script>th weighing? It turns out that the entropy of a random variable is maximized when all of its outcomes are equally likely. In the case of <script type="math/tex">T_i</script>, this happens when each outcome has probability <script type="math/tex">1/3</script>, so <script type="math/tex">H(T_i) \leq \log_2 3</script>. Plugging this bound into <script type="math/tex">(*)</script>, we get necessary condition</p>
<script type="math/tex; mode=display">\log_2(2n - 1) \leq w\log_2(3)</script>
<p>… which is equivalent to the weaker necessary condition <script type="math/tex">2n - 1 \leq 3^w</script> we already derived. What’s missing?</p>
<p>To get the stricter condition, we need one last observation: we can’t actually make the three outcomes of each weighing equally likely! Suppose that in the first weighing we put <script type="math/tex">k</script> islanders on each side of the see-saw. The probability of each outcome is</p>
<script type="math/tex; mode=display">p_1(\ell) = \frac{2k}{2n - 1} \quad p_1(r) = \frac{2k}{2n - 1} \quad p_1(b) = \frac{2n - 1 - 4k}{2n - 1}.</script>
<p>That is, of the <script type="math/tex">2n - 1</script> scenarios, there are <script type="math/tex">2k</script> that would make the see-saw tip left: <script type="math/tex">k</script> where someone on the left is heavier and <script type="math/tex">k</script> where someone on the right is lighter. (Remember that, by definition, the lonely islander is not on the see-saw.) But we cannot possibly have <script type="math/tex">2k = 2n - 1 - 4k</script>, because <script type="math/tex">2k</script> is even and <script type="math/tex">2n - 1 - 4k</script> is odd. So we can’t make the three outcomes equally likely, which means <script type="math/tex">% <![CDATA[
H(T_i) < \log_2(3) %]]></script>. Plugging this bound into <script type="math/tex">(*)</script>, we get necessary condition</p>
<script type="math/tex; mode=display">% <![CDATA[
\log_2(2n - 1) < w\log_2(3), %]]></script>
<p>which is equivalent to <script type="math/tex">% <![CDATA[
2n - 1 < 3^w %]]></script>, as desired.</p>
<hr />
<h2 id="appendix">Appendix</h2>
<p>Here are a few of the details we skipped above. The proofs don’t use any fancy techniques, so they make good exercises if you can resist peeking.</p>
<h4 id="lemma-1"><strong>Lemma 1.</strong></h4>
<p>If the riddle is solvable, then it is solvable with a deterministic islander-weighing strategy.</p>
<h4 id="proof"><em>Proof.</em></h4>
<p>Suppose we have a nondeterministic islander-weighing strategy solves the riddle. That means when we get to a nondeterministic step, we choose between one of several options for to proceed. But our strategy always works, which means it should work no matter which option we pick. In particular, it still works if we always arbitrarily pick the first option, which makes our strategy deterministic. <a href="#lemma-1-link"><script type="math/tex">\square</script></a></p>
<h4 id="lemma-2"><strong>Lemma 2.</strong></h4>
<p>Any deterministic islander-weighing strategy that solves the riddle has at most one lonely islander.</p>
<h4 id="proof-1"><em>Proof.</em></h4>
<p>Recall that an islander is “lonely” for a given strategy if whenever they are the odd-weight islander, they are not placed on the see-saw. We’re going to show that if a strategy has a lonely islander, then none of the other islanders are lonely. Say that Jake is the lonely islander and consider Charles, another islander. We need to show that if Charles is the odd-weight islander, then we have put Charles on the see-saw at least once.</p>
<p>The key is this: because Jake is lonely, we can’t put Jake on the see-saw until we get at least one unbalanced weighing. This is because until we see an unbalanced weighing, Jake could still be the odd-weight islander.</p>
<p>Suppose now that Charles is the odd-weight islander. If we never put Charles on the see-saw, then all the weighings are balanced. But because all the weighings are balanced, we must never have put Jake on the see-saw. Because neither Jake nor Charles have been weighed, we cannot possibly tell the difference between them. Therefore, to solve the riddle, if Charles is the odd-weight islander, our strategy must at some point put Charles on the see-saw. <a href="#lemma-2-link"><script type="math/tex">\square</script></a></p>
Sun, 04 Aug 2019 00:00:00 +0000
http://ziv.codes/2019/08/04/the-information-theory-of-brooklyn-99.html
http://ziv.codes/2019/08/04/the-information-theory-of-brooklyn-99.htmlHow I Draw Slides<p>After my MAMA talk a few days ago,
many people were curious how I made my <a href="/pdf/multitask-talk.pdf">slides</a>.
The short version:
I drew them with a tablet in an SVG editor, and I recommend it!
Details below.</p>
<h2 id="hardware">Hardware</h2>
<p>I use an <a href="https://us-store.wacom.com/Product/Intuos-Art-Medium-S01">Intuos Art</a>, medium size,
which I got specifically for this purpose.
Its only fancy feature is pressure sensitivity,
but this is enough for my simple drawings.
Drawing on the tablet while looking at the computer screen
takes some getting used to.
The tablet has a grid of dots that make it feasible
to draw while looking at the tablet rather than the screen,
which I found easier for shapes with lots of right angles.
I’d consider getting a Surface or iPad Pro in the future
to be able to see more directly what I’m drawing.</p>
<h2 id="software">Software</h2>
<p>There are two steps to making the slides:
drawing pictures and assembling them with text as a presentation.
I did not find any tool that was good at both,
but I did find a pair that works well.</p>
<p>I use <a href="https://graphic.com">Autodesk Graphic</a> for drawing
and <a href="https://www.apple.com/keynote/">Keynote</a> for slides.
The key feature of this pair is that
you can copy and paste any selected part of a drawing
directly from Graphic into Keynote—no exporting or importing required!
This is invaluable for iterating quickly.
I would have needed maybe 50 files if I had to export and import
each animated component individually,
plus an extra 10 or so from slides that got cut.</p>
<p>Here are some other programs I tried.</p>
<ul>
<li>PowerPoint seems to work with Graphic well, too.</li>
<li>I actually found the most paper-like drawing experience
in raster (pixel-by-pixel) art programs like Corel Painter
(a version of which comes bundled with the Intuos Art),
but I’m willing to put up with a little clunkiness
to produce SVGs (scalable vector graphics).</li>
<li>Inkscape could maybe replace Graphic on Windows or Linux,
but on my Mac, I could only copy-paste raster images from it.
I think this is some sort of Pateboard-CLIPBOARD incompatibility.</li>
<li>Curiously, OneNote creates vector graphics
while feeling as natural as the raster programs.
However, I couldn’t copy-paste vector drawings from OneNote
into Keynote, PowerPoint, Preview, or anything else I tried.
For instance, when pasting into PowerPoint, the image gets rasterized.
The workaround is clunky and involves exporting and importing.</li>
<li>Xournal also creates vector graphics and has a good drawing feel,
but it doesn’t use smooth curves in its paths,
which makes it look worse than Graphic and OneNote.</li>
</ul>
<p>Here are some more details about how I use Graphic.</p>
<ul>
<li>I use the brush tool in Graphic with 10% smoothing.
This works pretty well, but I have to try drawing each shape a few times.
When a shape comes out well except for a small part,
I sometimes manually tweak it with the path tool.</li>
<li>I use lots of layers in Graphic to break up drawings into pieces.
If a complicated drawing has a complicated animation,
each stage of the animation gets its own layer at the very least.
<ul>
<li>Graphic makes it very easy to select all objects
that live in an arbitrary subset of layers,
so parts of the drawing that are common to multiple animation stages
each get their own layer, too.</li>
<li>For example, the animation on <a href="/pdf/multitask-talk.pdf">slide 2</a> has 7 layers.
In order: the queue, jobs, speech bubbles,
one for each of the red, green, and blue distributions,
and the coordinate axes.</li>
<li>Some of my layers collect many small utility drawings.
One of them has several arrows and curly braces, for instance.</li>
</ul>
</li>
<li>I made a <a href="/pdf/gittins-index-intro.pdf">previous presentation</a>
by drawing each slide individually in Graphic.
This worked okay, but each slide being a separate file
made it too cumbersome to make animations.</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>I find that tablet-drawn slides give a presentation a friendly vibe
while keeping a crisp look.
Perhaps more importantly,
it reduces the activation energy (for me, at least)
of including pictures and animations.
After finding the right setup,
I’m now faster drawing pictures in Graphic
than building similar diagrams directly in PowerPoint or Keynote
(or—<em>shudder</em>—TikZ).
If you give this approach a try, let me know how it goes!</p>
Thu, 08 Jun 2017 00:00:00 +0000
http://ziv.codes/2017/06/08/how-i-draw-slides.html
http://ziv.codes/2017/06/08/how-i-draw-slides.htmlAdjoint Functors and Computation<p>I’ve been sitting on this post not finishing it for almost a year. At this point it’s as finished as it will ever be, so I’m putting what I’ve got so far out there.</p>
<hr />
<p>This post is about category theory. If you don’t yet see the “fun” in “functor”, it will probably be difficult to follow. If you want to try to follow along anyway, look up what categories/product objects/exponential objects/functors/natural transformations/adjunctions are, try to read this post, fail to get very far, find three or four more introductions to category theory, read all of them to gain as much intuition as possible from their slightly different perspectives, try again, fail again, become a monk at one of those isolated-in-the-mountains math temples, vow not to speak until you truly understand the Yoneda Lemma, realize this all escalated rather quickly, decide you’ve had enough, move to a small island in the Carribean, conclude that maybe the beach is more fun than math, have an epiphany while brushing teeth revealing a tiny aspect of what adjoint functors might be all about, and write a blog post about it. That’s more or less how I got here.</p>
<p>Only slightly more seriously: you can probably get something out of this post only if you know what categories and functors are, and familiarity with their basic concepts and notation shall be mercilessly assumed. Before continuing, if you do not yet know what a natural transformation is, you should at least attempt to read a precise definition of the term elsewhere, because you won’t find one here. We’ll start with an attempt to grasp that formal definition intuitively, and we’ll continue that trend throughout.</p>
<h2 id="definitions">Definitions</h2>
<p>If <script type="math/tex">F</script> is a functor and <script type="math/tex">X</script> is an object, then <script type="math/tex">FX</script> is in some sense an object with “outer structure” of type <script type="math/tex">F</script> and “inner information” of type <script type="math/tex">X</script>. A functor “maps” a morphism by using the morphism only on the level of inner information while leaving the outer structure intact. For example, consider the list functor, <script type="math/tex">L : \mathbf{Set} \to \mathbf{Set}</script>. On objects, it brings each set <script type="math/tex">X</script> to the set of lists of elements of <script type="math/tex">X</script>. On morphisms, it maps each function <script type="math/tex">f</script> to “mapped <script type="math/tex">f</script>”, which, given a list as input, outputs the list of results of applying <script type="math/tex">f</script> to each member of the list. For instance, if <script type="math/tex">f(x) = x + 3</script>, then <script type="math/tex">Lf([1,2,3]) = [4,5,6]</script>. Mapped functions deal only with inner information, applying a function to individual elements of a list, but they don’t modify outer structure by adding, removing, or rearranging elements of the list.</p>
<p>Natural transformations are the opposite: they are morphisms that act on outer structure only, leaving the inner information intact. A natural transformation <script type="math/tex">\alpha : F \to G</script> maps outer structure <script type="math/tex">F</script> to outer structure <script type="math/tex">G</script>. Of course, <script type="math/tex">F</script> and <script type="math/tex">G</script> aren’t objects (of the categories we care about right now), so <script type="math/tex">\alpha</script> is represented as a collection of <em>components</em>, with an <script type="math/tex">\alpha_X : FX \to GX</script> for each object <script type="math/tex">X</script> in the domain of <script type="math/tex">F</script>, but in a certain sense all its components do the same thing. As an example, for any <script type="math/tex">X</script>, we can define a list reversal function <script type="math/tex">\rho_X : LX \to LX</script>. But this is kind of tedious: to reverse a list, we don’t care which set its members are from. We just change their order. We just change the outer structure. List reversal is “polymorphic” in that any choice can be made for what’s inside the list being reversed. That is, list reversal is a natural transformation <script type="math/tex">\rho : L \to L</script>.</p>
<p>The notion of operating on outer structure only is made precise by the naturality condition. Given a morphism <script type="math/tex">f : X \to Y</script>, which acts only on inner information, and a natural transformation <script type="math/tex">\alpha : F \to G</script>, which acts only on outer structure, there are ways we can imagine building a morphism that transforms both, <script type="math/tex">FX \to GY</script>: either use a map of <script type="math/tex">f</script> on inner information followed by <script type="math/tex">\alpha</script> on outer structure, or vice versa. If <script type="math/tex">\alpha</script> really does ignore inner information and maps of <script type="math/tex">f</script> really do ignore outer structure, these two choices should be the same. The naturality condition captures this in an equation: <script type="math/tex">Gf \circ \alpha_X = \alpha_Y \circ Ff</script>.</p>
<p>An <em>adjunction</em> of two functors <script type="math/tex">F : \mathcal{C} \to \mathcal{D}</script> and <script type="math/tex">G : \mathcal{D} \to \mathcal{C}</script> is a pair of natural transformations:</p>
<ul>
<li>the <em>unit</em>, <script type="math/tex">\eta : 1_{\mathcal{D}} \to GF</script>, and</li>
<li>the <em>counit</em>, <script type="math/tex">\varepsilon : FG \to 1_{\mathcal{C}}</script>;</li>
</ul>
<p>satisfying a pair of natural transformation composition laws called the <em>triangle identities</em> (because when drawn as commuting diagrams, each equation is a triangle): for all objects <script type="math/tex">X</script> in <script type="math/tex">\mathcal{C}</script> and <script type="math/tex">Y</script> in <script type="math/tex">\mathcal{D}</script>,</p>
<ul>
<li><script type="math/tex">F\eta_X \circ \varepsilon_{FX} = 1_{FX}</script>, and</li>
<li><script type="math/tex">\eta_{GY} \circ G\varepsilon_Y = 1_{GY}</script>.</li>
</ul>
<p>If you didn’t know what an adjunction was already, well, now you… probably still don’t. But don’t panic! If you followed most of the discussion of natural transformations, you’re all set to keep reading. The internet is full of many detailed explanations of the definition of adjunctions written by people who know it better than I do. My personal favorite is a series of videos by <a href="https://youtu.be/loOJxIOmShE">The Catsters</a>, but, as mentioned in the introduction, seeing many explanations and intuitive perspectives helped me a lot. Instead of giving additional definitional detail, this post introduces another such intuitive perspective: some adjunctions can be thought of as describing <em>evaluation of computation</em>.</p>
<h2 id="free-and-forgetful-functors">Free and Forgetful Functors</h2>
<p>Some classic adjoint functor pair examples are “free” and “forgetful” functors for various algebraic structures over sets, such as groups, rings, and monoids. For concreteness, we consider monoids, which are quickly defined and explained below.</p>
<p>A <em>monoid</em> is a set with an associative binary operation and an element that’s the left and right identity of that operation. Monoids are like groups in which inverses might not exist. Indeed, all groups are also monoids. One monoid that isn’t a group is the set of all <script type="math/tex">n \times n</script> matrices: multiplication is an associative operation with an identity, but not all matrices are invertible. A <em>monoid homomorphism</em> is, analogous a group or ring homomorphism, a function that preserves the operation and its identity. In equations, using <script type="math/tex">\bullet_A</script> and <script type="math/tex">1_A</script> to denote the operation and identity element of a monoid <script type="math/tex">A</script>, we say <script type="math/tex">f : A \to B</script> is a monoid homomorphism if <script type="math/tex">f(x \, \bullet_A \, y) = f(x) \, \bullet_B \, f(y)</script> and <script type="math/tex">f(1_A) = 1_B</script>. There is a category of all monoids, which we creatively call <script type="math/tex">\mathbf{Mon}</script>, with all monoids as objects and all monoid homomorphisms as morphisms. As with groups, by default we call the operation “multiplication”, and we write it as juxtaposition, often without parentheses, which associativity makes unnecessary.</p>
<p>We have two functors to define: the <em>free</em> functor, <script type="math/tex">F : \mathbf{Set} \to \mathbf{Mon}</script>, and the <em>forgetful</em> functor, <script type="math/tex">G : \mathbf{Mon} \to \mathbf{Set}</script>. The forgetful functor is easy to describe. On objects, it maps each monoid to its underlying set of elements, “forgetting” what the operation does and which element is the identity. On morphisms, it maps each monoid homomorphism to its underlying function between two sets, “forgetting” that the function happened to satisfy any equations.</p>
<p>The free functor is slightly trickier to describe. The <em>free monoid</em> on a set <script type="math/tex">X</script>, written <script type="math/tex">FX</script> (hint, hint), is a monoid “freely generated” by the elements of <script type="math/tex">X</script>. This means two things.</p>
<ul>
<li>By “generated”, we mean that the underlying set of <script type="math/tex">FX</script> has all the elements of <script type="math/tex">X</script> plus anything else needed to be a monoid. For example, if we were to generate a monoid from <script type="math/tex">\{17,42\}</script> using <script type="math/tex">+</script> as the operation, our generated monoid would need <script type="math/tex">0</script>, because it’s the identity, <script type="math/tex">59</script>, because it’s <script type="math/tex">17+42</script>, and many more numbers.</li>
<li>By “freely”, we mean that the operation of <script type="math/tex">FX</script> never assumes two things are equal if they don’t have to be. For example, if <script type="math/tex">X = \{x,y,z\}</script>, then <script type="math/tex">x(yz) = (xy)z</script> is required by associativity, but <script type="math/tex">xy \neq yx</script> because no monoid axiom says they have to be equal.</li>
</ul>
<p>It turns out that <script type="math/tex">FX</script> has a concise interpretation: the free monoid on <script type="math/tex">X</script> is <em>lists of elements of <script type="math/tex">X</script></em>, with concatenation of lists as the operation. For instance, concatenating the lists <script type="math/tex">[x,y,y]</script> and <script type="math/tex">[x,z,x]</script> gives</p>
<script type="math/tex; mode=display">[x,y,y][x,z,x] = [x,y,y,x,z,x].</script>
<p>The identity of <script type="math/tex">FX</script> is the empty list, <script type="math/tex">[]</script>. We sometimes call the free monoid the “list monoid”.</p>
<p>As suggested by our notation, the free functor <script type="math/tex">F : \mathbf{Set} \to \mathbf{Mon}</script> maps each set to the free monoid on it. To finish the definition, we need to define how to turn a function <script type="math/tex">f : X \to Y</script> into a monoid homomorphism <script type="math/tex">Ff : FX \to FY</script>. Ignoring the monoid homomorphism conditions, our task is this: given a function <script type="math/tex">f : X \to Y</script> and a list of elements of <script type="math/tex">X</script>, generate a list of elements of <script type="math/tex">Y</script>. Recalling our discussion of the list functor <script type="math/tex">L</script>, we take <script type="math/tex">Ff</script> to be list-mapped <script type="math/tex">f</script>. It’s not hard to check that this satisfies the axioms for both functors and monoid homomorphisms.</p>
<p>A subtle distinction bears mentioning: though we call both “list-mapped <script type="math/tex">f</script>”, <script type="math/tex">Ff</script> and <script type="math/tex">Lf</script> are not the same thing. They’re not even the same type of thing! The former is a monoid homomorphism, and the latter is a function. That said, they are related: <script type="math/tex">Lf</script> is the underlying function of <script type="math/tex">Ff</script>. (In fact, <script type="math/tex">GF = L</script>. More on this in a bit.)</p>
<p>Similar notions exist for groups and rings. We’ll focus on monoids in the next section but will mention rings as well, for which we’ll need the following result (with proof left as an exercise, of course): the free (commutative) ring on a set <script type="math/tex">X</script> is the polynomial ring with integer coefficients where each element of <script type="math/tex">X</script> is a variable.</p>
<h2 id="the-free-forgetful-counit-is-expression-evaluation">The Free-Forgetful Counit is Expression Evaluation</h2>
<p>Let’s summarize the story so far.</p>
<ul>
<li>The free functor, <script type="math/tex">F : \mathbf{Set} \to \mathbf{Mon}</script>, maps each set to its list monoid and each function to its list-mapped version.</li>
<li>The forgetful functor, <script type="math/tex">G : \mathbf{Mon} \to \mathbf{Set}</script>, maps each monoid to its underlying set and each monoid homomorphism to its underlying function.</li>
</ul>
<p>As mentioned at the beginning of the previous section, these are adjoint functors, which means there are natural transformations <script type="math/tex">\eta : 1_{\mathbf{Set}} \to GF</script> and <script type="math/tex">\varepsilon : FG \to 1_{\mathbf{Mon}}</script> satisfying the triangle identities. Before trying to figure out what <script type="math/tex">\eta</script> and <script type="math/tex">\varepsilon</script> are, let’s first understand what the relevant functor compositions are.</p>
<ul>
<li><script type="math/tex">GF : \mathbf{Set} \to \mathbf{Set}</script> brings a set <script type="math/tex">X</script> to the underlying set of the list monoid on <script type="math/tex">X</script>, which is the set of lists of elements of <script type="math/tex">X</script>. We’ve actually seen <script type="math/tex">GF</script> before: it’s the list functor <script type="math/tex">L</script> from the discussion of natural transformations.</li>
<li><script type="math/tex">FG : \mathbf{Mon} \to \mathbf{Mon}</script> brings a monoid <script type="math/tex">Y</script> to the list monoid on the underlying set of <script type="math/tex">Y</script>. I like to think of this as the monoid of “unevaluated expressions” in <script type="math/tex">Y</script> by thinking of a list of elements of <script type="math/tex">Y</script> as a list of terms to be multiplied. Multiplying unevaluated expressions corresponds to list concatenation. For example, we can multiply <script type="math/tex">17 \times 42</script> and <script type="math/tex">38 \times 99</script> without simplifying to get <script type="math/tex">17 \times 42 \times 38 \times 99</script>.</li>
</ul>
<p>This, along with the intuition of natural transformations as “polymorphic” morphisms, is enough to guess what the unit and counit are.</p>
<p>Let’s start with the unit, <script type="math/tex">\eta : 1_{\mathbf{Set}} \to GF</script>. Given a set <script type="math/tex">X</script>, a component <script type="math/tex">\eta_X : X \to GFX</script> is a function from <script type="math/tex">X</script> to lists of elements of <script type="math/tex">X</script>, which we called <script type="math/tex">LX</script> earlier on and call <script type="math/tex">GFX</script> now. That is, <script type="math/tex">\eta_X</script> gets a single element of <script type="math/tex">X</script> as input and has to produce a list of elements as output. A simple way to do this is to produce a singleton list, so we define</p>
<script type="math/tex; mode=display">\eta_X(x) = [x].</script>
<p>It’s straightforward to mechanically verify that <script type="math/tex">\eta</script> is a natural transformation. It certainly fits our polymorphism intuition. Each component <script type="math/tex">\eta_X</script> wraps a list “outer structure” around its argument in the exact same way, without regard for the “inner information” about what the argument is or what set it’s from.</p>
<p>We turn to the counit, <script type="math/tex">\varepsilon : FG \to 1_{\mathbf{Mon}}</script>. Given a monoid <script type="math/tex">Y</script>, a component <script type="math/tex">\varepsilon_Y : FGY \to Y</script> is a monoid homomorphism from the list monoid on the underlying set of <script type="math/tex">Y</script> to <script type="math/tex">Y</script> itself. That is, <script type="math/tex">\varepsilon_Y</script> gets a list of elements of <script type="math/tex">Y</script> as input and has to produce a single element as output. A simple way to do this is to multiply everything in the list together to produce a single result (with the empty list mapping to the identity of <script type="math/tex">Y</script>), so we define</p>
<script type="math/tex; mode=display">\varepsilon_Y([y_1, y_2, \ldots, y_n]) = y_1 y_2 \cdots y_n.</script>
<p>That is, if we think of a list of elements of <script type="math/tex">Y</script> as an unevaluated monoid expression, then <script type="math/tex">\varepsilon_Y</script> <em>evaluates</em> the expression.</p>
<p>It’s straightforward to mechanically verify that <script type="math/tex">\varepsilon</script> is a natural transformation. That said, it doesn’t clearly fit our polymorphism intuition because we use multiplication, which feels like using “inner information”. However, as we’re about to see, this feeling is wrong!</p>
<p>In <script type="math/tex">\mathbf{Set}</script>, given multiple arbitrary elements of an arbitrary set, there’s no way for the multiple elements to interact. Natural transformations can move elements around, as we saw with our earlier example of list reversal, but there’s no way to use the given elements to get a new element of the set. If this seems restrictive, it’s because it is. The morphisms in <script type="math/tex">\mathbf{Set}</script> are arbitrary functions, so the inner information that morphisms can modify is basically as unrestricted as possible. This flexibility when modifying inner information is what puts such strong restrictions on modifying outer structure. Given functors <script type="math/tex">F, G : \mathbf{Set} \to \mathbf{Set}</script>, if a family of functions <script type="math/tex">\alpha_X : FX \to GX</script> does anything too fancy, we can find some function <script type="math/tex">f : X \to Y</script> such that <script type="math/tex">Gf \circ \alpha_X \neq \alpha_Y \circ Ff</script> because there are so many things arbitrary functions can do.</p>
<p>The story is different for <script type="math/tex">\mathbf{Mon}</script> because its morphisms are more restricted than those in <script type="math/tex">\mathbf{Set}</script>: they preserve multiplication and identity elements. Furthermore, given multiple arbitrary elements of an arbitrary monoid, there are two ways we can get new elements that weren’t initially given: multiplication and getting the identity. Together, these facts mean both multiplication and using the identity are fair game when modifying outer structure. This is intuitively why <script type="math/tex">\varepsilon</script> is a natural transformation: it uses only identity (when given the empty list) and multiplication (when given a list with more than one element), both of which are outer structure for the purposes of monoid homomorphisms.</p>
<p>Showing that <script type="math/tex">\eta</script> and <script type="math/tex">\varepsilon</script> actually satisfy the triangle identities is an unsurprising exercise that can be left for another day.</p>
<h2 id="more-than-monoids">More than Monoids</h2>
<p>Hmmm, this post is pretty long already. Here are the two further examples I was going to talk about before my arms fell off.</p>
<ul>
<li>The free-forgetful counit for rings is polynomial evaluation. The reasoning is pretty similar to what we’ve seen for monoids, except we use <script type="math/tex">\mathbf{Ring}</script> instead of <script type="math/tex">\mathbf{Mon}</script>. This means the outer structure that natural transformations can use includes addition, subtraction, and additive identity as well as the multiplication and multiplicative identity we had for monoids.</li>
<li>As an example that does not involve a free-forgetful adjunction: the product-exponential counit is function evaluation.</li>
</ul>
<p>Hopefully this helped in understanding what natural transformations, or maybe even adjunctions, are at a level that’s intuitive but not just hand-waving. If you’re curious for more material on relating adjunctions to computation, I’m pretty sure “something something free algebra” is a relevant next step.</p>
Tue, 02 May 2017 00:00:00 +0000
http://ziv.codes/2017/05/02/adjoint-functors-and-computation.html
http://ziv.codes/2017/05/02/adjoint-functors-and-computation.htmlKnowing that Everyone Knows<p>We consider a classic “paradox” where a simple inductive proof seems to clash with intuition. Though the proof makes clear that the naive intuition is wrong, it’s hard to pinpoint exactly where the intuition’s logical error is. After discussing the paradox at some length with my family, we came up with an angle of attack that gives an intuitive framework that both matches the math and makes the problem with the naive intuition clearer.</p>
<p>The situation is as follows. Dragons, as you probably already know, are a perfectly honest and rational species with color vision and either red or blue eyes. One hundred red-eyed dragons are on an island, sworn to a two-part pact:</p>
<ul>
<li>they will not communicate with each other, look at reflections, or otherwise directly find out what color eyes they have, and</li>
<li>if any dragon can logically deduce some day that they have red eyes, then that dragon will leave the island the following night.</li>
</ul>
<p>The dragons live for years on the island, each of them seeing ninety-nine red-eyed dragons but none of them able to logically deduce that they too have red eyes. One day, a perfectly honest visitor comes to the island, announces that at least one of the dragons has red eyes, and leaves.</p>
<p>If you haven’t heard this before, try to figure out before continuing: what happens?</p>
<hr />
<p><em>On the one hundredth night after being told that at least one of them has red eyes, all the dragons leave the island!</em></p>
<p>Here’s the argument.</p>
<ul>
<li>If there were exactly one dragon <script type="math/tex">X</script> with red eyes, they would have seen only blue eyes and deduced that they must be the one with red eyes, so <script type="math/tex">X</script> would leave on the first night following the announcement.</li>
<li>If there were exactly two dragons <script type="math/tex">X</script> and <script type="math/tex">Y</script> with red eyes, they would both stay the first night. The following day, each would see that the other hadn’t already left. <script type="math/tex">X</script> knows by the previous bullet point that if <script type="math/tex">Y</script> were the only dragon with red eyes, then <script type="math/tex">Y</script> would have left on the first night. This didn’t happen, so <script type="math/tex">X</script> deduces that they must also have red eyes. Symmetrically, so does <script type="math/tex">Y</script>, and both leave on the second night.</li>
<li>More generally, if exactly <script type="math/tex">k</script> dragons have red eyes, then after <script type="math/tex">k-1</script> nights of no dragons leaving, each of them realizes that, if the other <script type="math/tex">k-1</script> red-eyed dragons were the only dragons with red eyes, they would have left on night <script type="math/tex">k-1</script>. This didn’t happen, so they deduce that they must also have red eyes, and all <script type="math/tex">k</script> red-eyed dragons leave on night <script type="math/tex">k</script>.</li>
</ul>
<p>This is a pretty simple inductive argument, but there’s an apparent paradox: the announcement made by the visitor was something all of the dragons already knew! What difference does it make? The typical (and entirely correct) answer is that without the announcement, the first bullet point doesn’t hold. That bullet point is the crucial base case of the inductive argument each dragon uses to deduce they have red eyes. But even though I know how induction works, I find it very counterintuitive that this should matter, because every dragon sees at least two other red-eyed dragons and therefore knows they aren’t in the base case!</p>
<p>The rough reason that the base case matters, even though all the dragons know they aren’t in it, is that we have to not just consider what each dragon knows, but also what each dragon knows about what each other dragon knows… and what each dragon knows about what each other dragon knows about what each other dragon knows, and so on. I was able to figure things out for up to three red-eyed dragons, but after that there were too many cases to keep track of.</p>
<p>Following a common mathematical theme, to give ourselves better intuition about a complicated situation, we’re going to define a new concept and build intuition about that new concept instead of about the situation directly. Let us call a dragon <em><script type="math/tex">k</script>-aware</em> for positive integer <script type="math/tex">k</script> under the following conditions.</p>
<ul>
<li>A dragon is <script type="math/tex">1</script>-aware when they know at least one dragon has red eyes.</li>
<li>For <script type="math/tex">k \geq 2</script>, a dragon is <script type="math/tex">k</script>-aware when they know every dragon is <script type="math/tex">(k-1)</script>-aware.</li>
</ul>
<p>For example, if only one dragon <script type="math/tex">X</script> has red eyes, then every other dragon is <script type="math/tex">1</script>-aware. If only two dragons <script type="math/tex">X</script> and <script type="math/tex">Y</script> have red eyes, then they are both <script type="math/tex">1</script>-aware and every other dragon is <script type="math/tex">2</script>-aware: not only do the other dragons know that <script type="math/tex">X</script> and <script type="math/tex">Y</script> have red eyes, but they know that each of <script type="math/tex">X</script> and <script type="math/tex">Y</script> can see the other, so they know that every dragon can see a red-eyed dragon. We can generalize this.</p>
<h4 id="theorem"><strong>Theorem.</strong></h4>
<p>Before the visitor’s announcement, if a dragon can see at least <script type="math/tex">k</script> red-eyed dragons, they are <script type="math/tex">k</script>-aware, and if they can see at most <script type="math/tex">k</script> red-eyed dragons, they are not <script type="math/tex">(k+1)</script>-aware.</p>
<h4 id="proof"><em>Proof.</em></h4>
<p>We prove each statement separately by induction.</p>
<ul>
<li>If a dragon can see another red-eyed dragon, then they are <script type="math/tex">1</script>-aware.</li>
<li>If a dragon can see at least <script type="math/tex">k \geq 2</script> red-eyed dragons, then they know that every other dragon can see at least <script type="math/tex">k-1</script> red-eyed dragons. By the inductive hypothesis, they know every other dragon is <script type="math/tex">(k-1)</script>-aware, so they are <script type="math/tex">k</script>-aware.</li>
<li>If a dragon sees no red-eyed dragons, then they are not <script type="math/tex">1</script>-aware.</li>
<li>If a dragon <script type="math/tex">X</script> can see at most <script type="math/tex">k \geq 1</script> red-eyed dragons, then because <script type="math/tex">X</script> must consider that they might have blue eyes, it is possible that each of those red-eyed dragons can see just <script type="math/tex">k-1</script> other red-eyed dragons. By the inductive hypothesis, <script type="math/tex">X</script> cannot know for sure that those red-eyed dragons are <script type="math/tex">k</script>-aware, so <script type="math/tex">X</script> is not <script type="math/tex">(k+1)</script>-aware. <script type="math/tex">\square</script></li>
</ul>
<p>This theorem means that before the visitor’s announcement, the dragons are all <script type="math/tex">99</script>-aware. After the visitor’s announcement, <em>the dragons become <script type="math/tex">k</script>-aware for every <script type="math/tex">k \geq 1</script></em> because of the public nature of the announcement: not only does everyone know that at least one dragon has red eyes, but everyone knows that everyone knows this, and everyone knows that everyone knows that everyone knows this, and so on. This makes all the difference.</p>
<h4 id="theorem-1"><strong>Theorem.</strong></h4>
<p>If there are exactly <script type="math/tex">k</script> red-eyed dragons and they simultaneously become <script type="math/tex">k</script>-aware, they will leave <script type="math/tex">k</script> nights later.</p>
<h4 id="proof-1"><em>Proof.</em></h4>
<p>We prove only that the dragons that are supposed to leave do so at the right time, given that the other dragons stay. It’s not too hard to add the details to rigorously show that all the other dragons do indeed stay.</p>
<ul>
<li>Suppose a dragon sees no red-eyed dragons but becomes <script type="math/tex">1</script>-aware. They immediately deduce they have red eyes because nobody else does, so they leave on the first possible night.</li>
<li>Suppose for some <script type="math/tex">k \geq 2</script> that a dragon <script type="math/tex">X</script> is <script type="math/tex">k</script>-aware and sees exactly <script type="math/tex">k-1</script> red-eyed dragons. By <script type="math/tex">k</script>-awareness, <script type="math/tex">X</script> knows that those red-eyed dragons are all <script type="math/tex">(k-1)</script>-aware. <script type="math/tex">X</script> reasons that if they had blue eyes, then those red-eyed dragons would have each seen exactly <script type="math/tex">k-2</script> red-eyed dragons and, by the inductive hypothesis, would have left on night <script type="math/tex">k-1</script>. Therefore, if this doesn’t happen, <script type="math/tex">X</script> can deduce that they must have red eyes and will leave on the next night, which is night <script type="math/tex">k</script>. <script type="math/tex">\square</script></li>
</ul>
<p>The above proof is essentially the same as the initial argument, but the explicit definition and usage of <script type="math/tex">k</script>-awareness helped me (and, hopefully, you!) build better intuition for it.</p>
Sat, 09 Jan 2016 00:00:00 +0000
http://ziv.codes/2016/01/09/knowing-that-everyone-knows.html
http://ziv.codes/2016/01/09/knowing-that-everyone-knows.html