Coherent Extrapolated Volition

"Do unto others as you predict they would like to have done to them, if they were fully aware of the situation and the consequences of your actions, and were behind the veil of ignorance, and also explain what you are doing to whatever extent that they can understand."

If you could give the ruler of all the universe a single rule to govern by, what should it be? There are only three caveats: the rule will be followed to the letter; the rule will be followed everywhere, on every level; and the ruler of the world is alien as only a hypothetical philosophical construct can be.

There is no right answer, at least not so far. There are two central problems; first, the rule will be followed by someone who understands much more than we do, and second, the rule will be followed by someone who does not understand us.

The general goal is what Eliezer Yudkowsky calls Coherent Extrapolated Volition -- an extrapolation of what we all would want if we understood all the universe. The problem is, it's hard to keep this sort of thing coherent. The goal is an understanding of human ethics, motivations, and goals that can be used to guide all interactions with humans. Yudkowsky's target is specifically to find a rule that we could give to an AI so that it would do what we want it to do, even when it becomes not just smarter than us, but passes the point that we have any context for its knowledge and powers. His most famous formulation is that we want it to do what we would want it to do "if we knew more, thought faster, were more the people we wished we were, and had grown up closer together." -- which sounds great until you consider what sort of people some people want to be more like. Obviously, this is not itself a solution to the problem, but it is a gesture in the direction that the solution might lie.

The core problem is that we have not yet found a way for humans to live together without having Very Serious Disagreements that result in violence and death. We might have our own ideas on this, but as a matter of fact, most of us would not wish that a random person's solution to this problem be imposed upon us, and generalizing this wish, we may assume that it would not be in most humans' interest to impose any given value system on all of humanity. There may be no solution, but until this is proven, we call this goal Coherent Extrapolated Volition.

That said, common sense does give us some support in making headway. It is a mistake to treat all human values as equal; some humans are evil, wrong, or stupid in ways that will cause them disutility even under a "perfect" system. While we want to be very careful about giving a more powerful entity the ability to make new rules under which we are evil, wrong, or stupid in ways that will cause us disutility, it is a necessary part of the process that we do so. The general solution is to be as kind as possible to those needing reeducation and rehabilitation. You do not need me to tell you the problems involved with our human attempts at morally good and effective reeducation and rehabilitation. And so it goes; we do not know if a solution is out there, but we do know that humans have not yet been smart enough to find it.

Friendly AI	strong convergence hypothesis	Golden Rule	Therapy Dogs
November 26, 2023	Pascal's mugging	The Paradox of AI not eliminating humans	Value drift
Orthogonality thesis	Super wicked problem	Treacherous turn	The veil of ignorance
Russia and the Near Abroad	How to give a hug	Do unto others as you would have others do unto you	Three Laws of Robotics