Book Review: The Information, by James Gleick

Or: “The Information: A History, a Theory, a Flood. A Review.”

Oh, I was nervous about this one.

It looked so good! Such an inviting cover, such a broad and eternally relevant topic. The Information. Rational, dispassionate. Ordered. And, “A Flood:” I half-hoped it might talk about information overload. (It does, a bit.)

But so many pages. Could I justify another pop-sci book on my To Read stack? Could I justify the time? Would it be fluff? or a difficult slog? If I’m reading for fun, I don’t want it to be harder work than what I already do for, you know, work.

Wasted worry.

The Information is a layman’s introduction to Claude Shannon’s information theory. It covers a lot of ground, and while it can be a bit slow in parts, it’s enjoyable. As a programmer, I was aware of information theory, a little, but not very clear what it was all about, or for. I was pretty sure it was lurking around behind compression, and probably positional numbering systems, especially the way they can look like dimensionality if you squint the right way, like with chmod permission bits. The Information filled in a lot of gaps for me, and showed me bridges into other fields I hadn’t expected.

Some teasers:

Gleick describes African talking drums as a way to illustrate information redundancy: two drums, high- and low-toned, mimic the tonal spoken language; drummers use long, flowery phrases to clarify ambiguity. He talks about how written language abstracts thought, and the invention of the dictionary.

He explains how information is like uncertainty or surprise. In a string of symbols (letters, music notes, numbers, bits), given a string of them, how easily can you guess the next one? If a torn piece of paper says “Kermit the Fr,” you can infer what was torn off. If I say “I got a BigMac and fries,” you can guess where I went for lunch; my adding “…at McDonald’s” doesn’t help you much – it doesn’t add much new information. (To explore this point, Claude Shannon had his wife repeatedly guess the next word in sentences from a detective novel.)

Gleick talks about information theory’s relationship to entropy. A closed system has fixed energy, but the energy dissipates: it spreads evenly throughout the system, and we can’t use it to do work. If we could re-order the energy, collect it, sort it, we could reverse entropy. Information is work.

Information is also related to computability. Sometimes, the best way to store a message is to store an algorithm for computing it.

This, in particular, is something I’d noticed. Which of these is a better way to send a smiley face? This one?

Or this one?

size(250, 250);
ellipse(125, 125, 200, 200);
ellipse(100, 90, 10, 10);
ellipse(165, 90, 10, 10);
arc(125, 125, 100, 120, 0.2, PI - 0.2);

The first is 2D grid of pixels. The second is the code to render it: an explanation of the steps to reproduce the image.

Which is better? Which is better for making an exact copy of that image? A checkerboard, 250 squares on a side, 2502 = 62500 squares in total, and a listing of which ones should be white (about 93% of them), which should be black? Or 11 lines of text – just 222 characters? Say you had to write the message down on paper and mail it: would you rather write a list of 62500 numbers, or 11 lines of code? What would the message’s recipient have to know to reproduce the image, exactly? Pixel-for-pixel?

The Information also eventually gets into DNA, genetics, and memetics. (I never knew Richard Dawkins coined the word meme!)

So. Despite being slow in parts, the book is much better, much more enjoyable, than this review. It’ll be an enjoyable bunch of hours, and give you new ways to think about things.

(Postscript: I read this book in mid-2012, and wrote this review in October 2012, but somehow forgot to publish it. Maybe it was information overload?)

Chaos, Order, and Software Development

Zach Dennis gave a very interesting, but not terribly well-received talk at RailsConf 2012, called “Sand Piles and Software.” (It’s on the schedule on Tuesday in Salon J, if you want to check it out.) Here are the slides (which are more suggestion than information), and here’s the synopsis:

This talk applies the concepts of chaos theory to software development using the Bak–Tang–Wiesenfeld sand pile model [PDF link] as the vehicle for exploration. The sand pile model, which is used to show how a complex system is attracted to living on the edge of chaos, will be used as a both a powerful metaphor and analogy for building software. Software, it turns out, has its own natural attraction to living in its own edge of chaos. In this talk, we’ll explore what this means and entertain questions for what to do about it.

The TL;DR of the talk was: as you build your software system, as you add features, you add complexity, and when it’s too complex, you won’t be able to add anything more, until you clean something up. So you clean a bit up, and add more complexity, until it falls over again. Like dropping grains of sand onto a sand pile, each grain is tiny, hardly worth noting, but one of them will cause a slide.

That much rang very true with me.

Zach’s advice, then, was to “fall in love with simplicity,” and “loathe unnecessary complication,” and there are some more slides about practices and values and refactoring, but I can’t remember the ideas for them; I’ll have to check my notes.

To me, that part sounded virtuous.

This morning, I turned again, for other reasons, to Dick Gabriel’s Mob Software: The Erotic Life of Code. (I’ll say it until I stop meeting programmers who haven’t read him: you are missing out.) I got to the part where he talks about swarms (he’s preparing to introduce us to the Mob, the open-source hackers), and complexity emerging from local actors with simple rules, and this part reminded me of Zach Dennis’ talk:

Chaos is unpredictability: Combinations that might have lasting value or interest don’t last—the energy for change is too high. Order is total predictability: The only combinations that exist are the ones that always have—the energy for stability is too high.

He goes on to quote Stuart Kauffman from “At Home in the Universe”:

It is a lovely hypothesis, with considerable supporting data, that genomic systems lie in the ordered regime near the phase transition to chaos. Were such systems too deeply into the frozen ordered regime, they would be too rigid to coordinate the complex sequences of genetic activities necessary for development. Were they too far into the gaseous chaotic regime, they would not be orderly enough.

…cell networks achieve both stability and flexibility…by achieving a kind of poised state balanced on the edge of chaos.

Is Zach telling us to stay where it’s safe and ordered? Are we stuck on this edge between chaos and order, if we want to write interesting software? I’d like my software to be both stable and flexible. If, to achieve this stability and flexibility, its behavior must be emergent, not guided by my brain, is that ok? Or is there a way for me to still specify requirements, and get this stability and flexibility? Is emergent-design only able to produce certain kinds of software?

One of Zach's slides: reaching your software's critical point

Thanks to Ren for reviewing this!