This sequence is Eliezer's second attempt at The Great Canon of Modern Rationality. He took the insight-porn-style of the original and recast it into a sleeker (and quite frankly, more useful) theory-then-exercise format. Not unlike the original, however, he begins by digging beneath the idea of truth:
The Sally-Anne False-Belief task is an experiment used to tell whether a child understands the difference between belief and reality. It goes as follows:
- The child sees Sally hide a marble inside a covered basket, as Anne looks on.
- Sally leaves the room, and Anne takes the marble out of the basket and hides it inside a lidded box.
- Anne leaves the room, and Sally returns.
- The experimenter asks the child where Sally will look for her marble.
Immediately, he cleaves his epistemology into two things: beliefs and reality. The former, a thing in mindspace. The latter, a thing in--well, the thing where all things are (for now). He then goes on to emphasise what it means for beliefs to have a truth-condition, or whether or not one's belief corresponds to reality in the Tarskian "The sentence 'X' is true iff X" kind of way.
The first thing I've noticed here is just how much the words "belief" and "reality" sweep under the rug. If you're a pure reductionist, this is the time to feel cheated. A grand epistemology that does not reduce to atoms or quantum fields? Pshaw. But by treating these two concepts as more or less primitives, Eliezer avoids making a substantial amount of nontrivial science a prerequisite to his worldview. If you're trying to build an AI from scratch, or maybe just trying to describe an epistemology that can be useful to intelligences in general, then you had better start by assuming the minimum set of possible tools available to them.
To use a Yudkowskian epistemology, therefore, you need three objects:
This broth is beginning to smell delicious. What happens if we squint our eyes a bit and let our imaginations run wild?
Whew. But we're getting ahead of ourselves.
Can we do interesting things already with what we have? Absolutely! We can compare beliefs with each other like we compare sentences. For instance:
"The marble is inside the basket."
"The marble is inside the box."
are saying two different things. Likewise, even though (human) beliefs are just particular patterns of neural structures and/or activity in a particular brain1, and even though you need those patterns to undergo complex neural gymnastics before something resembling interpretation can be done on them, in most (working) brains they resolve in a similar enough manner to sentences that we can assign both content2 and truth-condition to them as properties. And it turns out that what we're really supposed to be interested in is the second one.
We're in a bit of a bind here. We can compare beliefs with beliefs, sure. But how do we let beliefs and reality interact? Eliezer constructs a tower of beliefs by allowing belief in belief or beliefs that behave similar to the sentence "I believe that I should have faith in X." and recursive descriptions thereof. Then he promptly collapses this potentially problematic edifice by asserting that
...saying 'I believe the sky is blue, and that's true!' typically conveys the same information as 'I believe the sky is blue' or just saying 'The sky is blue'...
But this collapse of Belief of Belief of Belief...of X to Belief of X ends there. We don't have a way to get X out. The belief that "The sky is blue." and whether or not the sky is actually blue are still different things, and only our epistemology so far can only say something about the former. We're stuck in a monad3.
How do we unpack X from Belief of X? By evaluating the latter's truth-condition.
The truth of a belief, for Eliezer, is what you end up with when a chain of causes and effects in reality extends all the way down to it (which we mentioned was a pattern in the brain, and therefore very much a thing in reality). It is the result of the process:
Sun emits photon -> photon hits shoelace -> shoelace reemits photon -> photon hits retina -> cone/rod gets activated -> neural impulse travels down optical pathway -> impulse reaches visual center -> visual center activates object recognition center -> you recognise the shoelace => Belief in Shoelace
What is the problem here? Well, what happens when you're crazy? When you fail at the very last step of that process? Eliezer isn't stupid. He has dealt with this sort of thing before. And I think his point is that, indeed, not all brains are created equal. If your brain churns out the wrong interpretation of this causality-based information we call "truth", you are going to have a bad time to the extent that your interpretation is wrong. And by "bad time", I mean you will have certain beliefs that will cause you to act in a certain way which will most likely not produce the effects you believed would happen (such as expecting to find food in your fridge and finding none).
The reply I gave...was that my beliefs determine my experimental predictions, but only reality gets to determine my experimental results.
Yes, perfect timing.
It's time to introduce more concepts.
Let's introduce the concept of anticipation. To anticipate something is to predict or believe that that something will be the result of some interaction of things in reality. What is a result? Like allowable moves on a chessboard, a particular state of reality. Now, one can have beliefs about results4, so let's say to anticipate a result means to believe that reality will assume that particular state at a particular point in the future5. And while we're at it, let's call the set of all beliefs of a particular brain a map of reality. We can then imagine how anticipation helps us iteratively improve our maps, in a process that goes something like this:
Okay, so actually I cheated a bit here, in step (3). Sensory information isn't the only sort of information you can use to establish the truth of your beliefs (or in particular, any step in its cause-and-effect chain). For instance, we cannot directly observe photons being emitted from the Sun (otherwise it wouldn't hit our worn-out shoelace) nor can we feel each of our individual neurons firing, yet we consider the chain of causality we outlined above as plausible. Why is this?
Because truth as a rule is a generalisable property of maps.
What do I mean by this? Extracting truth from reality is amenable to inquiry. We can imagine our truth-extracting tools as a separate discipline not unlike science or natural philosophy before that or metaphysics before that or even mere campfire tales of gods and their erratic whims before that as well. By saying that truth generalises over maps, we say that certain techniques are better at extracting truth from reality than others, better at improving the accuracy of our own maps. We have already seen one such technique, namely, look out the window and see for yourself. But there are other techniques, such as find patterns and calculate their consequences, assuming they hold for all relevant instances. This latter technique is what justifies our belief that photons are emitted by the sun, because we know what the sun is, and we know that Sun-stuff (which is similar to certain stuff we can make back home) emits photons.
Eliezer ends his post with a preview of the next in the sequence. Suppose you are writing papers for a class on esoteric English literature. You are tasked to identify if a particular author named Elaine is "post-utopian", which your professor defined on a previous lecture as an author whose writing contains elements of "colonial alienation". How would you do it?
Fig. 1: Why post-utopia sucks: because they still use XML as a generic data representation. Credits to the illustrator of The Useful Idea of Truth.
If we use the Tarskian schema mentioned at the start of this piece, we get:
The sentence "Elaine is post-utopian." is true iff Elaine is post-utopian.
So we unpack from the definition. We look for elements of "colonial alienation" in Elaine's work. We sample a few literature professors and ask if they consider Elaine to be post-utopian. But the thing is, literary theory is ripe with alternative interpretations and arguable definitions and a pernicious subjectivism that everyone is entitled to believe what they want. So whither the truth of Elaine's post-utopianism?
The danger of using words willy-nilly is that it can produce what Eliezer calls floating beliefs. These are beliefs that, while having a chain of cause-and-effect to back them up, participate in very few (if at all) in the cause-and-effect chains of other beliefs6. Perhaps there was one person back in the day who knew what post-utopianism is, but now she's dead and her students just memorised who the post-utopian authors are to pass their exams, and their students, and their students' students, until the cause-and-effect chain settled unto your professor.
Can post-utopianism be true? Sure, but it sure as hell impossible now for your professor to anticipate any state of the world that can cleave the set of all authors into post-utopians and not-post-utopians.
Some of you might think: "But he can! Just imagine atoms in the universe going one way if Elaine were post-utopian, and another way if she were not."
But under this rule7,
Then the theory of quantum mechanics would be meaningless a priori, because there's no way to arrange atoms to make the theory of quantum mechanics true.
And when we discovered that the universe was not made of atoms, but rather quantum fields, all meaningful statements everywhere would have been revealed as false - since there'd be no atoms arranged to fulfill their truth-conditions.
Eliezer brands this as a particular instance of verificationism, the idea that we can only evaluate the truth of our beliefs using our senses (which interact with matter), and only those verifiable as such are meaningful.
Before we pick up on this point of what things mean, we'll take a detour in the next post to extirpate the sophistry surrounding the word "rational" that has built up over the years, in a way similar to what we've done for the word "truth" here.
(to be continued...)
Which are almost but not quite specific to that brain. See grandmother cell and alternatives.↩
One way of carving up this notion is by distinguishing between references and their referents. I'll refer you to the Stanford Encyclopedia of Philosophy on this point.↩
In programming, subroutines can be identified as pure if they satisfy two properties: a) they don't produce observable side effects when run, and b) their result only depends on their arguments at the point of evaluation. However, a program with no side-effects cannot really affect the real-world much, so what we can do is put them in boxes which we can query with questions like "If your contents happen to have side-effects, what would be the result of its evaluation?" to avoid impurity in the rest of our code. We call these boxes monads. See LINK.↩
Remember that beliefs live in the patterns inside your head and that you may only compare beliefs directly.↩
I am having trouble formulating this without invoking the concept of time.↩
Actually, Eliezer is more forceful here. He considers floating beliefs as entirely disconnected from your web of beliefs.l↩
This is another useful truth-extraction technology: avoid proving too much.↩