Russian
| English
"Куда идет мир? Каково будущее науки? Как "объять необъятное", получая образование - высшее, среднее, начальное? Как преодолеть "пропасть двух культур" - естественнонаучной и гуманитарной? Как создать и вырастить научную школу? Какова структура нашего познания? Как управлять риском? Можно ли с единой точки зрения взглянуть на проблемы математики и экономики, физики и психологии, компьютерных наук и географии, техники и философии?"

«ARTIFICIAL INTELLIGENCE AS A POSITIVE AND NEGATIVE FACTOR IN GLOBAL RISK» 
Eliezer S. Yudkowsky

  • Crack the protein folding problem, to the extent of being able to generate DNA strings whose folded peptide sequences fill specific functional roles in a complex chemical interaction.
    • Email sets of DNA strings to one or more online laboratories which offer DNA synthesis, peptide sequencing, and FedEx delivery. (Many labs currently offer this service, and some boast of 72-hour turnaround times.)
    • Find at least one human connected to the Internet who can be paid, blackmailed, or fooled by the right background story, into receiving FedExed vials and mixing them in a specified environment.
    • The synthesized proteins form a very primitive «wet» nanosystem which, ribosome-like, is capable of accepting external instructions; perhaps patterned acoustic vibrations delivered by a speaker attached to the beaker.
    • Use the extremely primitive nanosystem to build more sophisticated systems, which construct still more sophisticated systems, bootstrapping to molecular nanotechnology — or beyond.

    The elapsed turnaround time would be, imaginably, on the order of a week from when the fast intelligence first became able to solve the protein folding problem. Of course this whole scenario is strictly something am thinking of. Perhaps in 19,500 years of subjective time (one week of physical time at a millionfold speedup) I would think of a better way. Perhaps you can pay for rush courier delivery instead of FedEx. Perhaps there are existing technologies, or slight modifications of existing technologies, that combine synergetically with simple protein machinery. Perhaps if you are sufficiently smart, you can use waveformed electrical fields to alter reaction pathways in existing biochemical processes. I don’t know. I’m not that smart.

    The challenge is to chain your capabilities — the physical-world analogue of combining weak vulnerabilities in a computer system to obtain root access. If one path is blocked, you choose another, seeking always to increase your capabilities and use them in synergy. The presumptive goal is to obtain rapid infrastructure, means of manipulating the external world on a large scale in fast time. Molecular nanotechnology fits this criterion, first because its elementary operations are fast, and second because there exists a ready supply of precise parts — atoms — which can be used to self-replicate and exponentially grow the nanotechnological infrastructure. The pathway alleged above has the AI obtaining rapid infrastructure within a week — this sounds fast to a human with 200Hz neurons, but is a vastly longer time for the AI.

    Once the AI possesses rapid infrastructure, further events happen on the AI’s timescale, not a human timescale (unless the AI prefers to act on a human timescale). With molecular nanotechnology, the AI could (potentially) rewrite the solar system unopposed.

    An unFriendly AI with molecular nanotechnology (or other rapid infrastructure) need not bother with marching robot armies or blackmail or subtle economic coercion. The unFriendly AI has the ability to repattern all matter in the solar system according to its optimization target. This is fatal for us if the AI does not choose specifically according to the criterion of how this transformation affects existing patterns such as biology and people. The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else. The AI runs on a different timescale than you do; by the time your neurons finish thinking the words «I should do something» you have already lost.

    A Friendly AI plus molecular nanotechnology is presumptively powerful enough to solve any problem which can be solved either by moving atoms or by creative thinking. One should beware of failures of imagination: Curing cancer is a popular contemporary target of philanthropy, but it does not follow that a Friendly AI with molecular nanotechnology would say to itself, «Now I shall cure cancer.» Perhaps a better way to view the problem is that biological cells are not programmable. To solve the latter problem cures cancer as a special case, along with diabetes and obesity. A fast, nice intelligence wielding molecular nanotechnology is power on the order of getting rid of disease, not getting rid of cancer.

    There is finally the family of species metaphors, based on between-species differences of intelligence. The AI has magic — not in the sense of incantations and potions, but in the sense that a wolf cannot understand how a gun works, or what sort of effort goes into making a gun, or the nature of that human power which lets us invent guns. Vinge (1993) wrote:

    Strong superhumanity would be more than cranking up the clock speed on a human-equivalent mind. It’s hard to say precisely what strong superhumanity would be like, but the difference appears to be profound. Imagine running a dog mind at very high speed. Would a thousand years of doggy living add up to any human insight?

    The species metaphor would seem the nearest analogy a priori , but it does not lend itself to making up detailed stories. The main advice the metaphor gives us is that we had better get Friendly AI right, which is good advice in any case. The only defense it suggests against hostile AI is not to build it in the first place, which is also excellent advice. Absolute power is a conservative engineering assumption in Friendly AI, exposing broken designs. If an AI will hurt you given magic, the Friendliness architecture is wrong regardless.

    10: Local and majoritarian strategies

    One may classify proposed risk-mitigation strategies into:

    • Strategies which require unanimous cooperation; strategies which can be catastrophically defeated by individual defectors or small groups.
    • Strategies which require majority action; a majority of a legislature in a single country, or a majority of voters in a country, or a majority of countries in the UN: the strategy requires most, but not all, people in a large pre-existing group to behave a particular way.
    • Strategies which require local action — a concentration of will, talent, and funding which overcomes the threshold of some specific task.
    Unanimous strategies are unworkable, which does not stop people from proposing them.

    majoritarian strategy is sometimes workable, if you have decades in which to do your work. One must build a movement, from its first beginnings over the years, to its debut as a recognized force in public policy, to its victory over opposing factions. Majoritarian strategies take substantial time and enormous effort. People have set out to do such, and history records some successes. But beware: history books tend to focus selectively on movements that have an impact, as opposed to the vast majority that never amount to anything. There is an element involved of luck, and of the public’s prior willingness to hear. Critical points in the strategy will involve events beyond your personal control. If you are not willing to devote your entire life to pushing through a majoritarian strategy, don’t bother; and just one life devoted won’t be enough, either.

    Ordinarily, local strategies are most plausible. A hundred million dollars of funding is not easy to obtain, and a global political change is not impossible to push through, but it is still vastly easier to obtain a hundred million dollars of funding than to push through a global political change. Two assumptions which give rise to a majoritarian strategy for AI are these:

    • A majority of Friendly AIs can effectively protect the human species from a few unFriendly AIs.
    • The first AI built cannot by itself do catastrophic damage.

    This reprises essentially the situation of a human civilization before the development of nuclear and biological weapons: most people are cooperators in the overall social structure, and defectors can do damage but not global catastrophic damage. Most AI researchers will not want to make unFriendly AIs. So long as someone knows how to build a stably Friendly AI — so long as the problem is not completely beyond contemporary knowledge and technique — researchers will learn from each other’s successes and repeat them. Legislation could (for example) require researchers to publicly report their Friendliness strategies, or penalize researchers whose AIs cause damage; and while this legislation will not prevent all mistakes, it may suffice that a majority of AIs are built Friendly.

    We can also imagine a scenario that implies an easy local strategy:

    • The first AI cannot by itself do catastrophic damage.
    • If even a single Friendly AI exists, that AI plus human institutions can fend off any number of unFriendly AIs.

    The easy scenario would hold if e.g. human institutions can reliably distinguish Friendly AIs from unFriendly, and give revocable power into the hands of Friendly AIs. Thus we could pick and choose our allies. The only requirement is that the Friendly AI problem must be solvable (as opposed to being completely beyond human ability).

    Both of the above scenarios assume that the first AI (the first powerful, general AI) cannot by itself do global catastrophic damage. Most concrete visualizations which imply this use a metaphor: AIs as analogous to unusually able humans. In section 7 on rates of intelligence increase , I listed some reasons to be wary of a huge, fast jump in intelligence:
    • The distance from idiot to Einstein, which looms large to us, is a small dot on the scale of minds-in-general.
    • Hominids made a sharp jump in real-world effectiveness of intelligence, despite natural selection exerting roughly steady optimization pressure on the underlying genome.
    • An AI may absorb a huge amount of additional hardware after reaching some brink of competence (i.e., eat the Internet).
    • Criticality threshold of recursive self-improvement. One self-improvement triggering 1.0006 self-improvements is qualitatively different from one self-improvement triggering 0.9994 self-improvements.

    As described in section 9, a sufficiently powerful intelligence may need only a short time (from a human perspective) to achieve molecular nanotechnology, or some other form of rapid infrastructure.

    We can therefore visualize a possible first-mover effect in superintelligence. The first-mover effect is when the outcome for Earth-originating intelligent life depends primarily on the makeup of whichever mind first achieves some key threshold of intelligence — such as criticality of self-improvement. The two necessary assumptions are these:

    • The first AI to surpass some key threshold (e.g. criticality of self-improvement), if unFriendly, can wipe out the human species.
    • The first AI to surpass the same threshold, if Friendly, can prevent a hostile AI from coming into existence or from harming the human species; or find some other creative way to ensure the survival and prosperity of Earth-originating intelligent life.

    More than one scenario qualifies as a first-mover effect. Each of these examples reflects a different key threshold:

    • Post-criticality, self-improvement reaches superintelligence on a timescale of weeks or less. AI projects are sufficiently sparse that no other AI achieves criticality before the first mover is powerful enough to overcome all opposition. The key threshold is criticality of recursive self-improvement.
    • AI-1 cracks protein folding three days before AI-2. AI-1 achieves nanotechnology six hours before AI-2. With rapid manipulators, AI-1 can (potentially) disable AI-2′s R&D before fruition. The runners are close, but whoever crosses the finish line first, wins. The key threshold is rapid infrastructure.
    • The first AI to absorb the Internet can (potentially) keep it out of the hands of other AIs. Afterward, by economic domination or covert action or blackmail or supreme ability at social manipulation, the first AI halts or slows other AI projects so that no other AI catches up. The key threshold is absorption of a unique resource.

    The human species, Homo sapiens, is a first mover. From an evolutionary perspective, our cousins, the chimpanzees, are only a hairbreadth away from us. Homo sapiens still wound up with all the technological marbles because we got there a little earlier. Evolutionary biologists are still trying to unravel which order the key thresholds came in, because the first-mover species was first to cross so many: Speech, technology, abstract thought… We’re still trying to reconstruct which dominos knocked over which other dominos. The upshot is that Homo sapiens is first mover beyond the shadow of a contender.

    A first-mover effect implies a theoretically localizable strategy (a task that can, in principle, be carried out by a strictly local effort), but it invokes a technical challenge of extreme difficulty. We only need to get Friendly AI right in one place and one time, not every time everywhere. But someone must get Friendly AI right on the first try, before anyone else builds AI to a lower standard.

    I cannot perform a precise calculation using a precisely confirmed theory, but my current opinion is that sharp jumps in intelligence arepossible, likely, and constitute the dominant probability. This is not a domain in which I am willing to give narrow confidence intervals, and therefore a strategy must not fail catastrophically - should not leave us worse off than before — if a sharp jump in intelligence does notmaterialize. But a much more serious problem is strategies visualized for slow-growing AIs, which fail catastrophically if there is a first-mover effect. This is a more serious problem because:

    • Faster-growing AIs represent a greater technical challenge.
    • Like a car driving over a bridge built for trucks, an AI designed to remain Friendly in extreme conditions should (presumptively) remain Friendly in less extreme conditions. The reverse is not true.
    • Rapid jumps in intelligence are counterintuitive in everyday social reality. The g-factor metaphor for AI is intuitive, appealing, reassuring, and conveniently implies fewer design constraints It is my current guess that the curve of intelligence increase doescontain huge, sharp (potential) jumps.

    My current strategic outlook tends to focus on the difficult local scenario: The first AI must be Friendly. With the caveat that, if no sharp jumps in intelligence materialize, it should be possible to switch to a strategy for making a majority of AIs Friendly. In either case, the technical effort that went into preparing for the extreme case of a first mover should leave us better off, not worse.

    The scenario that implies an impossible, unanimous strategy is:

    • A single AI can be powerful enough to destroy humanity, even despite the protective efforts of Friendly AIs.
    • No AI is powerful enough to prevent human researchers from building one AI after another (or find some other creative way of solving the problem).

    It is good that this balance of abilities seems unlikely a priori, because in this scenario we are doomed. If you deal out cards from a deck, one after another, you will eventually deal out the ace of clubs.

    The same problem applies to the strategy of deliberately building AIs that choose not to increase their capabilities past a fixed point. If capped AIs are not powerful enough to defeat uncapped AIs, or prevent uncapped AIs from coming into existence, then capped AIs cancel out of the equation. We keep dealing through the deck until we deal out a superintelligence, whether it is the ace of hearts or the ace of clubs.

    A majoritarian strategy only works if it is not possible for a single defector to cause global catastrophic damage. For AI, this possibility or impossibility is a natural feature of the design space — the possibility is not subject to human decision any more than the speed of light or the gravitational constant.

    11: AI versus human intelligence enhancement

    I do not think it plausible that Homo sapiens will continue into the indefinite future, thousands or millions of billions of years, without anymind ever coming into existence that breaks the current upper bound on intelligence. If so, there must come a time when humans first face the challenge of smarter-than-human intelligence. If we win the first round of the challenge, then humankind may call upon smarter-than-human intelligence with which to confront later rounds.

    Perhaps we would rather take some other route than AI to smarter-than-human intelligence — say, augment humans instead? To pick one extreme example, suppose the one says: The prospect of AI makes me nervous. I would rather that, before any AI is developed, individual humans are scanned into computers, neuron by neuron, and then upgraded, slowly but surely, until they are super-smart; and that is the ground on which humanity should confront the challenge of superintelligence.

    We are then faced with two questions: Is this scenario possible? And if so, is this scenario desirable? (It is wiser to ask the two questions in that order, for reasons of rationality: we should avoid getting emotionally attached to attractive options that are not actually options.)

    Let us suppose an individual human is scanned into computer, neuron by neuron, as proposed in Moravec (1988). It necessarily follows that the computing capacity used considerably exceeds the computing power of the human brain. By hypothesis, the 4 Albeit Merkle (1989) suggests that non-revolutionary developments of imaging technologies, such as electron microscopy or optical sectioning, might suffice for uploading a complete brain.

    computer runs a detailed simulation of a biological human brain, executed in sufficient fidelity to avoid any detectable high-level effects from systematic low-level errors. Any accident of biology that affects information-processing in any way , we must faithfully simulate to sufficient precision that the overall flow of processing remains isomorphic. To simulate the messy biological computer that is a human brain, we need far more useful computing power than is embodied in the messy human brain itself.

    The most probable way we would develop the ability to scan a human brain neuron by neuron — in sufficient detail to capture everycognitively relevant aspect of neural structure — would be the invention of sophisticated molecular nanotechnology. 4 Molecular nanotechnology could probably produce a desktop computer with total processing power exceeding the aggregate brainpower of the entire current human population. (Bostrom 1998; Moravec 1999; Merkle and Drexler 1996; Sandberg 1999.)

    Furthermore, if technology permits us to scan a brain in sufficient fidelity to execute the scan as code , it follows that for some years previously, the technology has been available to obtain extremely detailed pictures of processing in neural circuitry, and presumably researchers have been doing their best to understand it.

    Furthermore, to upgrade the upload — transform the brain scan so as to increase the intelligence of the mind within — we must necessarily understand the high-level functions of the brain, and how they contribute usefully to intelligence, in excellent detail.

    Furthermore, humans are not designed to be improved, either by outside neuroscientists, or by recursive self-improvement internally. Natural selection did not build the human brain to be humanly hackable. All complex machinery in the brain has adapted to operate within narrow parameters of brain design. Suppose you can make the human smarter, let alone superintelligent; does the human remain sane ? The human brain is very easy to perturb; just changing the balance of neurotransmitters can trigger schizophrenia, or other disorders. Deacon (1997) has an excellent discussion of the evolution of the human brain, how delicately the brain’s elements may be balanced, and how this is reflected in modern brain dysfunctions. The human brain is not end-user-modifiable.

    All of this makes it rather implausible that the first human being would be scanned into a computer and sanely upgraded before anyone anywhere first built an Artificial Intelligence. At the point where technology first becomes capable of uploading, this implies overwhelmingly more computing power , and probably far better cognitive science , than is required to build an AI.

    Building a 747 from scratch is not easy. But is it easier to:

    • Start with the existing design of a biological bird,
    • and incrementally modify the design through a series of successive stages,
    • each stage independently viable,
    • such that the endpoint is a bird scaled up to the size of a 747 ,
    • which actually flies,
    • as fast as a 747,
    • and then carry out this series of transformations on an actual living bird,
    • without killing the bird or making it extremely uncomfortable?

    I’m not saying it could never, ever be done. I’m saying that it would be easier to build the 747, and then have the 747, metaphorically speaking, upgrade the bird. «Let’s just scale up an existing bird to the size of a 747″ is not a clever strategy that avoids dealing with the intimidating theoretical mysteries of aerodynamics. Perhaps, in the beginning, all you know about flight is that a bird has the mysterious essence of flight, and the materials with which you must build a 747 are just lying there on the ground. But you cannot sculpt the mysterious essence of flight, even as it already resides in the bird, until flight has ceased to be a mysterious essence unto you.

    The above argument is directed at a deliberately extreme case. The general point is that we do not have total freedom to pick a path that sounds nice and reassuring, or that would make a good story as a science fiction novel. We are constrained by which technologies are likely to precede others.

    I am not against scanning human beings into computers and making them smarter, but it seems exceedingly unlikely that this will be the ground on which humanity first confronts the challenge of smarter-than-human intelligence. With various strict subsets of the technology and knowledge required to upload and upgrade humans, one could:

    • Upgrade biological brains in-place (for example, by adding new neurons which will be usefully wired in);
    • or usefully interface computers to biological human brains;
    • or usefully interface human brains with each other;
    • or construct Artificial Intelligence.

    Furthermore, it is one thing to sanely enhance an average human to IQ 140, and another to enhance a Nobel Prize winner to something beyond human. (Leaving aside quibbles about the suitability of IQ, or Nobel-Prize-winning, as a measure of fluid intelligence; please excuse my metaphors.) Taking Piracetam (or drinking caffeine) may, or may not, make at least some people smarter; but it will not make you substantially smarter than Einstein. In which case we haven’t won any significant new capabilities; we haven’t made further rounds of the problem easier; we haven’t broken the upper bound on the intelligence available to deal with existential risks. From the standpoint of managing existential risk, any intelligence enhancement technology which doesn’t produce a (nice, sane) mind literally smarter than human, begs the question of whether the same time and effort could be more productively spent to find an extremely smart modern-day human and unleash them on the same problem.

    Furthermore, the farther you go from the «natural» design bounds of the human brain — the ancestral condition represented by the brain itself, to which individual brain components are adapted — the greater the danger of individual insanity. If the augment is substantially smarter than human, this too is a global catastrophic risk. How much damage can an evil augmented human do? Well… how creative are they? The first question that comes to my mind is, «Creative enough to build their own recursively self-improving AI?»

    Radical human intelligence enhancement techniques raise their own safety issues. Again, I am not claiming these problems as engineering impossibilities; only pointing out that the problems exist. AI has safety issues; so does human intelligence enhancement. Not everything that clanks is your enemy, and not everything that squishes is your friend. On the one hand, a nice human starts out with all the immense moral, ethical, and architectural complexity that describes what we mean by a «friendly» decision. On the other hand, an AI can be designed for stable recursive self-improvement, and shaped to safety: natural selection did not design the human brain with multiple rings of precautionary measures, conservative decision processes, and orders of magnitude of safety margin.

    Human intelligence enhancement is a question in its own right, not a subtopic of Artificial Intelligence; and this chapter lacks space to discuss it in detail. It is worth mentioning that I considered both human intelligence enhancement and Artificial Intelligence at the start of my career, and decided to allocate my efforts to Artificial Intelligence. Primarily this was because I did not expect useful, human-transcending intelligence enhancement techniques to arrive in time to substantially impact the development of recursively self-improving Artificial Intelligence. I would be pleasantly surprised to be proven wrong about this.

    But I do not think that it is a viable strategy to deliberately choose not to work on Friendly AI, while others work on human intelligence enhancement, in hopes that augmented humans will solve the problem better. I am not willing to embrace a strategy which failscatastrophically if human intelligence enhancement takes longer than building AI. (Or vice versa, for that matter.) I fear that working with biology will just take too much time — there will be too much inertia, too much fighting of poor design decisions already made by natural selection. I fear regulatory agencies will not approve human experiments. And even human geniuses take years to learn their art; the faster the augment has to learn, the more difficult it is to augment someone to that level.

    I would be pleasantly surprised if augmented humans showed up and built a Friendly AI before anyone else got the chance. But someone who would like to see this outcome will probably have to work hard to speed up intelligence enhancement technologies; it would be difficult to convince me to slow down. If AI is naturally far more difficult than intelligence enhancement, no harm done; if building a 747 is naturallyeasier than inflating a bird, then the wait could be fatal. There is a relatively small region of possibility within which deliberately not working on Friendly AI could possibly help, and a large region within which it would be either irrelevant or harmful. Even if human intelligence enhancement is possible, there are real, difficult safety considerations; would have to seriously ask whether we wanted Friendly AI to precede intelligence enhancement, rather than vice versa.

    I do not assign strong confidence to the assertion that Friendly AI is easier than human augmentation, or that it is safer. There are many conceivable pathways for augmenting a human. Perhaps there is a technique which is easier and safer than AI, which is also powerful enough to make a difference to existential risk. If so, I may switch jobs. But I did wish to point out some considerations which argue against the unquestioned assumption that human intelligence enhancement is easier, safer, and powerful enough to make a difference.

    12: Interactions of AI with other technologies

    Speeding up a desirable technology is a local strategy, while slowing down a dangerous technology is a difficult majoritarian strategy.Halting or relinquishing an undesirable technology tends to require an impossible unanimous strategy. I would suggest that we think, not in terms of developing or not-developing technologies, but in terms of our pragmatically available latitude to accelerate or slow downtechnologies; and ask, within the realistic bounds of this latitude , which technologies we might prefer to see developed before or afterone another.

    In nanotechnology, the goal usually presented is to develop defensive shields before offensive technologies. I worry a great deal about this, because a given level of offensive technology tends to require much less sophistication than a technology that can defend against it. Offense has outweighed defense during most of civilized history. Guns were developed centuries before bulletproof vests. Smallpox was used as a tool of war before the development of smallpox vaccines. Today there is still no shield that can deflect a nuclear explosion; nations are protected not by defenses that cancel offenses, but by a balance of offensive terror. The nanotechnologists have set themselves an intrinsically difficult problem.

    So should we prefer that nanotechnology precede the development of AI, or that AI precede the development of nanotechnology? As presented, this is something of a trick question. The answer has little to do with the intrinsic difficulty of nanotechnology as an existential risk, or the intrinsic difficulty of AI. So far as ordering is concerned, the question we should ask is, «Does AI help us deal with nanotechnology? Does nanotechnology help us deal with AI?»

    It looks to me like a successful resolution of Artificial Intelligence should help us considerably in dealing with nanotechnology. I cannot see how nanotechnology would make it easier to develop Friendly AI. If huge nanocomputers make it easier to develop AI without making it easier to solve the particular challenge of Friendliness, that is a negative interaction. Thus, all else being equal, I would greatly prefer that Friendly AI precede nanotechnology in the ordering of technological developments. If we confront the challenge of AI and succeed, we can call on Friendly AI to help us with nanotechnology. If we develop nanotechnology and survive, we still have the challenge of AI to deal with after that.

    Generally speaking, a success on Friendly AI should help solve nearly any other problem. Thus, if a technology makes AI neither easier nor harder, but carries with it a catastrophic risk, we should prefer all else being equal to first confront the challenge of AI.

    Any technology which increases available computing power decreases the minimum theoretical sophistication necessary to develop Artificial Intelligence, but doesn’t help at all on the Friendly side of things, and I count it as a net negative. Moore’s Law of Mad Science: Every eighteen months, the minimum IQ necessary to destroy the world drops by one point.

    A success on human intelligence enhancement would make Friendly AI easier, and also help on other technologies. But human augmentation is not necessarily safer, or easier, than Friendly AI; nor does it necessarily lie within our realistically available latitude to reverse the natural ordering of human augmentation and Friendly AI, if one technology is naturally much easier than the other.

    13: Making progress on Friendly AI

    «We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.»
    — McCarthy, Minsky, Rochester, and Shannon (1955).

    The Proposal for the Dartmouth Summer Research Project on Artificial Intelligence is the first recorded use of the phrase «artificial intelligence». They had no prior experience to warn them the problem was hard. I would still label it a genuine mistake, that they said «a significant advance can be made», not might be made, with a summer’s work. That is a specific guess about the problem difficulty and solution time, which carries a specific burden of improbability. But if they had said might, I would have no objection. How were they to know?

    The Dartmouth Proposal included, among others, the following topics: Linguistic communication, linguistic reasoning, neural nets, abstraction, randomness and creativity, interacting with the environment, modeling the brain, originality, prediction, invention, discovery, and self-improvement.

    5 This is usually true but not universally true. The final chapter of the widely used textbook Artificial Intelligence: A Modern Approach(Russell and Norvig 2003) includes a section on «The Ethics and Risks of Artificial Intelligence»; mentions I. J. Good’s intelligence explosion and the Singularity; and calls for further research, soon. But as of 2006, this attitude remains very much the exception rather than the rule.

    Now it seems to me that an AI capable of language, abstract thought, creativity, environmental interaction, originality, prediction, invention, discovery, and above all self-improvement, is well beyond the point where it needs also to be Friendly.

    The Dartmouth Proposal makes no mention of building nice/good/benevolent AI. Questions of safety are not mentioned even for the purpose of dismissing them. This, even in that bright summer when human-level AI seemed just around the corner. The Dartmouth Proposal was written in 1955, before the Asilomar conference on biotechnology, thalidomide babies, Chernobyl, or September 11th. If today the idea of artificial intelligence were proposed for the first time, then someone would demand to know what specifically was being done to manage the risks. I am not saying whether this is a good change or a bad change in our culture. I am not saying whether this produces good or bad science. But the point remains that if the Dartmouth Proposal had been written fifty years later, one of the topics would have been safety.

    At the time of this writing in 2006, the AI research community still doesn’t see Friendly AI as part of the problem. I wish I could cite a reference to this effect, but I cannot cite an absence of literature. Friendly AI is absent from the conceptual landscape, not just unpopular or unfunded. You cannot even call Friendly AI a blank spot on the map, because there is no notion that something is missing. 5 If you’ve read popular/semitechnical books proposing how to build AI, such as Godel, Escher, Bach (Hofstadter 1979) or The Society of Mind(Minsky 1986), you may think back and recall that you did not see Friendly AI discussed as part of the challenge. Neither have I seen Friendly AI discussed in the technical literature as a technical problem. My attempted literature search turned up primarily brief nontechnical papers, unconnected to each other, with no major reference in common except Isaac Asimov’s «Three Laws of Robotics». (Asimov 1942.)

    Given that this is 2006, why aren’t more AI researchers talking about safety? I have no privileged access to others’ psychology, but I will briefly speculate based on personal discussions.

    The field of Artificial Intelligence has adapted to its experiences over the last fifty years: in particular, the pattern of large promises, especially of human-level capabilities, followed by embarrassing public failure. To attribute this embarrassment to «AI» is perhaps unfair; wiser researchers who made no promises did not see their conservatism trumpeted in the newspapers. Still the failed promises come swiftly to mind, both inside and outside the field of AI, when advanced AI is mentioned. The culture of AI research has adapted to this condition: There is a taboo against talking about human-level capabilities. There is a stronger taboo against anyone who appears to be claiming or predicting a capability they have not demonstrated with running code. The perception I have encountered is that anyone who claims to be researching Friendly AI is implicitly claiming that their AI design is powerful enough that it needs to be Friendly.

    It should be obvious that this is neither logically true, nor practically a good philosophy. If we imagine someone creating an actual, mature AI which is powerful enough that it needs to be Friendly, and moreover, as is our desired outcome, this AI really is Friendly, then someone must have been working on Friendly AI for years and years. Friendly AI is not a module you can instantly invent at the exact moment when it is first needed, and then bolt on to an existing, polished design which is otherwise completely unchanged.

    The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque — the user has no idea how the neural net is making its decisions — and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.

    The most powerful current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem — which proved very expensive to fix, though not global-catastrophic — analogously arose from failing to foresee tomorrow’s design requirements. The nightmare scenario is that we find ourselves stuck with a catalog of mature, powerful, publicly available AI techniques which combine to yield non-Friendly AI, but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch.