An Unethical Optimization Principle
Abstract
If an artificial intelligence aims to maximise risk-adjusted return, then under mild conditions it is disproportionately likely to pick an unethical strategy unless the objective function allows sufficiently for this risk. Even if the proportion η of available unethical strategies is small, the probability pU of picking an unethical strategy can become large; indeed unless returns are fat-tailed pU tends to unity as the strategy space becomes large. We define an Unethical Odds Ratio Upsilon () that allows us to calculate pU from η, and we derive a simple formula for the limit of as the strategy space becomes large. We give an algorithm for estimating and pU in finite cases and discuss how to deal with infinite strategy spaces. We show how this principle can be used to help detect unethical strategies and to estimate η. Finally we sketch some policy implications of this work.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.