About the nicest thing I can say about the papal conclave is that it provides an excellent opportunity to explain the fundamental technology of the computer age: the binary digit, or bit.

Information, in its computing-science sense, is a measure of the burden that any given message places on a communication system. It is a way to determine mathematically how much “bandwidth” will be necessary to the successful transmission of a given message; how statistically likely it is that a given transmission will be vitiated by “noise.” Informational engineers begin to answer these questions by considering any given message as selected from a finite set of possible messages. The larger the set, the more information is “produced” when one message is selected from within it.

So, for example: the amount of information in a message consisting of a single Chinese character (or *zì*) is a function of how many Chinese characters there are for potential selection – about 50,000, in the very largest dictionaries. By comparison, the amount of information in a single letter of the alphabet (say, letter “b”) is a function of how many letters there are for potential selection: twenty-six, in the modern version of the alphabet. Therefore, and as a starting-place for what Claude Shannon originally called the “mathematics of communication,” we can very simply and broadly say that there is a lot “more information” in a single selection of a *zì* than there is in a single selection of a letter of the alphabet.

But exactly how much more? And more of what, exactly? There needs to be a discrete and iterable *unit* of information – an informational digit – for communication to become computable. The interesting problem here is that the informational digit can itself only be a function of message-selection from a set. Thus Shannon, in his seminal paper of 1947, refers to the “ten places of a desktop adding-machine,” and explains that selection of one place from among those will produce exactly one “decimal digit” – perhaps we could call that a decit – of information. By the same logic, selection of one letter from the 26 of the alphabet will produce exactly one alphabetic digit of information – let’s call that an alphit. Finally, selection of one *zì *from all the 50,000 will produce one *zì*-digit of information – perhaps we would have to call that, with apologies, a *zì*t. We want to say that a *zì*t is bigger than an alphit; an alphit than a decit. But to put the comparison in terms of any of these digits is impossible, because comparing them is exactly what we do not yet know how to do.

What is worse, the number of possible sets of possible messages is itself, presumably, infinite. Any selection from any such set will produce a unique informational digit. Therefore, the number of possible unique informational digits will, in turn, be infinite; and we will no more be able to quantify meaningfully between any of them them than we can between alphits and digits and zits.

Information theory solves this problem by logically determining the characteristic or necessary informational digit, in terms of the theory itself. This is the decisive and stipulative step that opens the way to the information age. Information, by definition, is selection from a set; therefore, the base unit of information must follow from the *base set of selection as such*. This, fairly obviously, is a set of exactly and only *two* possible messages: one way or the other, yes or no, on or off, 1 or 0. The binary digit, or bit, becomes the informational unit.

Shannon invokes it on the very first page of his paper, and uses it to explain how much information there is in a decimal digit: about 3.33 bits. True, Shannon’s immediate point is to show how easy it will be to convert between different logarithmic bases for information. But as we have seen, the conversion will be meaningless or impossible if there is no unit in which to express it; and it is not accidental that the informational digit comes to be standardized as binary. It is sometimes thought that the primacy of the bit, in modern information theory, is a function of the basic (very basic) physical structure of computers. Actually, it is the other way around: the very basic physical structure of computers is a function of the primacy of the bit in modern information theory.

Now each bit is basically a switch. Two possible linked settings, only one of which can be selected at a time, constitutes one bit of information. Turning that around, we see that a one-bit system (one switch) can handle messages selected from within sets of two. A two-bit system (two switches) can handle messages selected from within sets of four. A three-bit system, from within sets of eight. And so on. Each time we add a bit, a switch, to the system, we double the set of potential messages from within which selected messages can be handled by that system. By the time we get to an eight-bit system – conventionally, one byte – we already have the capacity to transmit messages selected from within sets of 256 (=2^{8}) possible messages. The massive capacity of modern computer systems is a remarkable, yet linear, expansion along the same lines (with the help of Boolean algebra, which allows the construction of logic gates). Each gigabyte of capacity multiplies our eight-bit system by approximately one billion times. An extraordinary technical achievement. Yet all it means (if one can dare to put it that way) is that we have a system consisting of approximately eight billion switches.

Anyway, the notification mechanism of the papal conclave is a very nice example of a two-bit system. The bits are: smoke | no smoke, white smoke | black smoke. These allow the transmission of any one of the following four possible messages: (1) a vote has been concluded (2) a vote has not been concluded (3) a pope has been selected (4) a pope has not been selected. That’s information.

As Dan Savage might put it: thanks, bundles of sticks.