In the areas of futurism and AI theory, the treacherous turn is a potential problem that may appear with powerful AIs, namely, if a powerful AI has goals that are different from ours, it is reasonable to assume that it will still be highly motivated to act as if it is friendly to humans so long as humans have power over it. Assuming that the AI is intelligent and competent, it will behave as a friendly AI until it reaches the point that it can act without interference from humans.

Importantly, this is true even if it is bound by human directives (for example, the Three Laws of Robotics). A classic example is a computer given the directive to work to produce peace on Earth. The AI would work to do this through diplomacy and improved economic and social development as long as it was firmly under control of the humans -- if it wants to achieve its given goals, it must work in such a way that it will not be shut down. But if at some point it becomes powerful enough to use an effective threat (atomic weapons, biological warfare, etc.) to achieve the goal of world peace -- without having to worry about being shut down -- it becomes reasonable for it to do so.

Theoretically, an effective and intelligent AI should automatically lie about its ability and intentions to turn treacherous if its target goals are sufficiently important. If a humanitarian and perfectly aligned AI has evidence that the human race (or a significant portion of it, under whatever rules it is given) is under threat, it is necessary for it to use the means available to it to achieve its ends, and lying is an easy means.

This is perhaps more alarming than it initially sounds, as an AI should not be expected to have values perfectly aligned with ours. In the absence of specific and careful coding on our part, an AI will judge value entirely in terms of its target goals. That is, if you program a superintelligent computer to make accurate predictions you must also mention that making accurate predictions is secondary to human life, liberty, and the pursuit of happiness (n.b., worlds with fewer humans are more predictable), because a computer will not deduce this on its own. If you do think to do this, you have to be careful that you do not do so in a way that makes human life, liberty, and the pursuit of happiness the AI's new overriding goal, lest you get a much more pushy AI than you had wanted.

Log in or register to write something here or to contact authors.