| English
"Куда идет мир? Каково будущее науки? Как "объять необъятное", получая образование - высшее, среднее, начальное? Как преодолеть "пропасть двух культур" - естественнонаучной и гуманитарной? Как создать и вырастить научную школу? Какова структура нашего познания? Как управлять риском? Можно ли с единой точки зрения взглянуть на проблемы математики и экономики, физики и психологии, компьютерных наук и географии, техники и философии?"

Eliezer S. Yudkowsky

In the late 19th century, many honest and intelligent people advocated communism, all in the best of good intentions. The people who first invented and spread and swallowed the communist meme were, in sober historical fact, idealists. The first communists did not have the example of Soviet Russia to warn them. At the time, without benefit of hindsight, it must have sounded like a pretty good idea. After the revolution, when communists came into power and were corrupted by it, other motives may have come into play; but this itself was not something the first idealists predicted, however predictable it may have been. It is important to understand that the authors of huge catastrophes need not be evil, nor even unusually stupid. If we attribute every tragedy to evil or unusual stupidity, we will look at ourselves, correctly perceive that we are not evil or unusually stupid, and say: «But that would never happen to us. «

What the first communist revolutionaries thought would happen, as the empirical consequence of their revolution, was that people’s lives would improve: laborers would no longer work long hours at backbreaking labor and make little money from it. This turned out not to be the case, to put it mildly. But what the first communists thought would happen, was not so very different from what advocates of other political systems thought would be the empirical consequence of their favorite political systems. They thought people would be happy. They were wrong. Now imagine that someone should attempt to program a «Friendly» AI to implement communism, or libertarianism, or anarcho-feudalism, or favoritepoliticalsystem, believing that this shall bring about utopia. People’s favorite political systems inspire blazing suns of positive affect, so the proposal will sound like a really good idea to the proposer.

We could view the programmer’s failure on a moral or ethical level — say that it is the result of someone trusting themselves too highly, failing to take into account their own fallibility, refusing to consider the possibility that communism might be mistaken after all. But in the language of Bayesian decision theory, there’s a complementary technical view of the problem. From the perspective of decision theory, the choice for communism stems from combining an empirical belief with a value judgment. The empirical belief is that communism, when implemented, results in a specific outcome or class of outcomes: people will be happier, work fewer hours, and possess greater material wealth. This is ultimately an empirical prediction; even the part about happiness is a real property of brain states, though hard to measure. If you implement communism, either this outcome eventuates or it does not. The value judgment is that this outcome satisfices or is preferable to current conditions. Given a different empirical belief about the actual real-world consequences of a communist system, the decision may undergo a corresponding change.

We would expect a true AI, an Artificial General Intelligence, to be capable of changing its empirical beliefs. (Or its probabilistic world-model, etc.) If somehow Charles Babbage had lived before Nicolaus Copernicus, and somehow computers had been invented before telescopes, and somehow the programmers of that day and age successfully created an Artificial General Intelligence, it would not follow that the AI would believe forever after that the Sun orbited the Earth. The AI might transcend the factual error of its programmers, provided that the programmers understood inference rather better than they understood astronomy. To build an AI that discovers the orbits of the planets, the programmers need not know the math of Newtonian mechanics, only the math of Bayesian probability theory.

The folly of programming an AI to implement communism, or any other political system, is that you’re programming means instead of ends. You’re programming in a fixed decision, without that decision being re-evaluable after acquiring improved empirical knowledge about the results of communism. You are giving the AI a fixed decision without telling the AI how to re-evaluate, at a higher level of intelligence, the fallible process which produced that decision.

If I play chess against a stronger player, I cannot predict exactly where my opponent will move against me — if I could predict that, I would necessarily be at least that strong at chess myself. But I can predict the end result, which is a win for the other player. I know the region of possible futures my opponent is aiming for, which is what lets me predict the destination, even if I cannot see the path. When I am at my most creative, that is when it is hardest to predict my actions, and easiest to predict the consequences of my actions. (Providing that you know and understand my goals!) If I want a better-than-human chess player, I have to program a search for winning moves. I can’t program in specific moves because then the chess player won’t be any better than I am. When I launch a search, I necessarily sacrifice my ability to predict the exact answer in advance. To get a really good answer you must sacrifice your ability to predict the answer, albeit not your ability to say what is the question.

Such confusion as to program in communism directly, probably would not tempt an AGI programmer who speaks the language of decision theory. I would call it a philosophical failure, but blame it on lack of technical knowledge.

6.2: An example of technical failure

«In place of laws constraining the behavior of intelligent machines, we need to give them emotions that can guide their learning of behaviors. They should want us to be happy and prosper, which is the emotion we call love. We can design intelligent machines so their primary, innate emotion is unconditional love for all humans. First we can build relatively simple machines that learn to recognize happiness and unhappiness in human facial expressions, human voices and human body language. Then we can hard-wire the result of this learning as the innate emotional values of more complex intelligent machines, positively reinforced when we are happy and negatively reinforced when we are unhappy. Machines can learn algorithms for approximately predicting the future, as for example investors currently use learning machines to predict future security prices. So we can program intelligent machines to learn algorithms for predicting future human happiness, and use those predictions as emotional values.»

  1. This story, though famous and oft-cited as fact, may be apocryphal; I could not find a first-hand report. For unreferenced reports see e.g. Crochat and Franklin (2000) or However, failures of the type described are a major real-world consideration when building and testing neural networks.
  2. Bill Hibbard, after viewing a draft of this paper, wrote a response arguing that the analogy to the «tank classifier» problem does not apply to reinforcement learning in general. His critique may be found at My response may be found at Hibbard also notes that the proposal of Hibbard (2001) is superseded by Hibbard (2004). The latter recommends a two-layer system in which expressions of agreement from humans reinforce recognition of happiness, and recognized happiness reinforces action strategies.
  3. This form of failure is especially dangerous because it will appear to work within a fixed context, then fail when the context changes. The researchers of the «tank classifier» story tweaked their neural network until it correctly loaded the training data, then verified the network on additional data (without further tweaking). Unfortunately, both the training data and verification data turned out to share an assumption which held over the all data used in development, but not in all the real-world contexts where the neural network was

— Bill Hibbard (2001), Super-intelligent machines

Once upon a time, the US Army wanted to use neural networks to automatically detect camouflaged enemy tanks. The researchers trained a neural net on 50 photos of camouflaged tanks in trees, and 50 photos of trees without tanks. Using standard techniques for supervised learning, the researchers trained the neural network to a weighting that correctly loaded the training set — output «yes» for the 50 photos of camouflaged tanks, and output «no» for the 50 photos of forest. This did not ensure, or even imply, that new examples would be classified correctly. The neural network might have «learned» 100 special cases that would not generalize to any new problem. Wisely, the researchers had originally taken 200 photos, 100 photos of tanks and 100 photos of trees. They had used only 50 of each for the training set. The researchers ran the neural network on the remaining 100 photos, and without further training the neural network classified all remaining photos correctly. Success confirmed! The researchers handed the finished work to the Pentagon, which soon handed it back, complaining that in their own tests the neural network did no better than chance at discriminating photos.

It turned out that in the researchers’ data set, photos of camouflaged tanks had been taken on cloudy days, while photos of plain forest had been taken on sunny days. The neural network had learned to distinguish cloudy days from sunny days, instead of distinguishing camouflaged tanks from empty forest. 2

A technical failure occurs when the code does not do what you think it does, though it faithfully executes as you programmed it. More than one model can load the same data. Suppose we trained a neural network to recognize smiling human faces and distinguish them from frowning human faces. Would the network classify a tiny picture of a smiley-face into the same attractor as a smiling human face? If an AI «hard-wired» to such code possessed the power — and Hibbard (2001) spoke of superintelligence — would the galaxy end up tiled with tiny molecular pictures of smiley-faces? called upon to function. In the story of the tank classifier, the assumption is that tanks are photographed on cloudy days.

Suppose we wish to develop an AI of increasing power. The AI possesses a developmental stage where the human programmers are more powerful than the AI — not in the sense of mere physical control over the AI’s electrical supply, but in the sense that the human programmers are smarter, more creative, more cunning than the AI. During the developmental period we suppose that the programmers possess the ability to make changes to the AI’s source code without needing the consent of the AI to do so. However, the AI is also intended to possess postdevelopmental stages, including, in the case of Hibbard’s scenario, superhuman intelligence. An AI of superhuman intelligence surely could not be modified without its consent. At this point we must rely on the previously laid-down goal system to function correctly, because if it operates in a sufficiently unforeseen fashion, the AI may actively resist our attempts to correct it — and, if the AI is smarter than a human, probably win.

Trying to control a growing AI by training a neural network to provide its goal system faces the problem of a huge context change between the AI’s developmental stage and postdevelopmental stage. During the developmental stage, the AI may only be able to produce stimuli that fall into the «smiling human faces» category, by solving humanly provided tasks, as its makers intended. Flash forward to a time when the AI is superhumanly intelligent and has built its own nanotech infrastructure, and the AI may be able to produce stimuli classified into the same attractor by tiling the galaxy with tiny smiling faces.

Thus the AI appears to work fine during development, but produces catastrophic results after it becomes smarter than the programmers(!).

There is a temptation to think, «But surely the AI will know that’s not what we meant?» But the code is not given to the AI, for the AI to look over and hand back if it does the wrong thing. The code is the AI. Perhaps with enough effort and understanding we can write code that cares if we have written the wrong code — the legendary DWIM instruction, which among programmers stands for Do-What-I-Mean. (Raymond 2003.) But effort is required to write a DWIM dynamic, and nowhere in Hibbard’s proposal is there mention of designing an AI that does what we mean, not what we say. Modern chips don’t DWIM their code; it is not an automatic property. And if you messed up the DWIM itself, you would suffer the consequences. For example, suppose DWIM was defined as maximizing the satisfaction of the programmer with the code; when the code executed as a superintelligence, it might rewrite the programmers’ brains to be maximally satisfied with the code. I do not say this is inevitable; I only point out that Do-What-I-Mean is a major, nontrivial technical challenge of Friendly AI.

7: Rates of intelligence increase

From the standpoint of existential risk, one of the most critical points about Artificial Intelligence is that an Artificial Intelligence might increase in intelligence extremely fast. The obvious reason to suspect this possibility is recursive self-improvement. (Good 1965.) The AI becomes smarter, including becoming smarter at the task of writing the internal cognitive functions of an AI, so the AI can rewrite its existing cognitive functions to work even better, which makes the AI still smarter, including smarter at the task of rewriting itself, so that it makes yet more improvements.

Human beings do not recursively self-improve in a strong sense. To a limited extent, we improve ourselves: we learn, we practice, we hone our skills and knowledge. To a limited extent, these self-improvements improve our ability to improve. New discoveries can increase our ability to make further discoveries — in that sense, knowledge feeds on itself. But there is still an underlying level we haven’t yet touched. We haven’t rewritten the human brain. The brain is, ultimately, the source of discovery, and our brains today are much the same as they were ten thousand years ago.

In a similar sense, natural selection improves organisms, but the process of natural selection does not itself improve — not in a strong sense. Adaptation can open up the way for additional adaptations. In this sense, adaptation feeds on itself. But even as the gene pool boils, there’s still an underlying heater, the process of mutation and recombination and selection, which is not itself re-architected. A few rare innovations increased the rate of evolution itself, such as the invention of sexual recombination. But even sex did not change the essential nature of evolution: its lack of abstract intelligence, its reliance on random mutations, its blindness and incrementalism, its focus on allele frequencies. Similarly, not even the invention of science changed the essential character of the human brain: its limbic core, its cerebral cortex, its prefrontal self-models, its characteristic speed of 200Hz.

An Artificial Intelligence could rewrite its code from scratch — it could change the underlying dynamics of optimization. Such an optimization process would wrap around much more strongly than either evolution accumulating adaptations, or humans accumulating knowledge. The key implication for our purposes is that an AI might make a huge jump in intelligence after reaching some threshold of criticality.

One often encounters skepticism about this scenario — what Good (1965) called an «intelligence explosion» — because progress in Artificial Intelligence has the reputation of being very slow. At this point it may prove helpful to review a loosely analogous historical surprise. (What follows is taken primarily from Rhodes 1986.)

In 1933, Lord Ernest Rutherford said that no one could ever expect to derive power from splitting the atom: «Anyone who looked for a source of power in the transformation of atoms was talking moonshine.» At that time laborious hours and weeks were required to fission a handful of nuclei.

Flash forward to 1942, in a squash court beneath Stagg Field at the University of Chicago. Physicists are building a shape like a giant doorknob out of alternate layers of graphite and uranium, intended to start the first self-sustaining nuclear reaction. In charge of the project is Enrico Fermi. The key number for the pile is k, the effective neutron multiplication factor: the average number of neutrons from a fission reaction that cause another fission reaction. At less than one, the pile is subcritical. At >= 1, the pile should sustain a critical reaction. Fermi calculates that the pile will reach k = 1 between layers 56 and 57.

A work crew led by Herbert Anderson finishes Layer 57 on the night of December 1, 1942. Control rods, strips of wood covered with neutron-absorbing cadmium foil, prevent the pile from reaching criticality. Anderson removes all but one control rod and measures the pile’s radiation, confirming that the pile is ready to chain-react the next day. Anderson inserts all cadmium rods and locks them into place with padlocks, then closes up the squash court and goes home.

The next day, December 2, 1942, on a windy Chicago morning of sub-zero temperatures, Fermi begins the final experiment. All but one of the control rods are withdrawn. At 10:37am, Fermi orders the final control rod withdrawn about half-way out. The geiger counters click faster, and a graph pen moves upward. «This is not it,» says Fermi, «the trace will go to this point and level off,» indicating a spot on the graph. In a few minutes the graph pen comes to the indicated point, and does not go above it. Seven minutes later, Fermi orders the rod pulled out another foot. Again the radiation rises, then levels off. The rod is pulled out another six inches, then another, then another. At 11:30, the slow rise of the graph pen is punctuated by an enormous CRASH — an emergency control rod, triggered by an ionization chamber, activates and shuts down the pile, which is still short of criticality. Fermi calmly orders the team to break for lunch.

At 2pm the team reconvenes, withdraws and locks the emergency control rod, and moves the control rod to its last setting. Fermi makes some measurements and calculations, then again begins the process of withdrawing the rod in slow increments. At 3:25pm, Fermi orders the rod withdrawn another twelve inches. «This is going to do it,» Fermi says. «Now it will become self-sustaining. The trace will climb and continue to climb. It will not level off.»

Herbert Anderson recounts (from Rhodes 1986.):

«At first you could hear the sound of the neutron counter, clickety-clack, clickety-clack. Then the clicks came more and more rapidly, and after a while they began to merge into a roar; the counter couldn’t follow anymore. That was the moment to switch to the chart recorder. But when the switch was made, everyone watched in the sudden silence the mounting deflection of the recorder’s pen. It was an awesome silence. Everyone realized the significance of that switch; we were in the high intensity regime and the counters were unable to cope with the situation anymore. Again and again, the scale of the recorder had to be changed to accomodate the neutron intensity which was increasing more and more rapidly. Suddenly Fermi raised his hand. ‘The pile has gone critical,’ he announced. No one present had any doubt about it.»

Fermi kept the pile running for twenty-eight minutes, with the neutron intensity doubling every two minutes. The first critical reaction had of 1.0006. Even at =1.0006, the pile was only controllable because some of the neutrons from a uranium fission reaction are delayed - they come from the decay of short-lived fission byproducts. For every 100 fissions in U 235 , 242 neutrons are emitted almost immediately (0.0001s), and 1.58 neutrons are emitted an average of ten seconds later. Thus the average lifetime of a neutron is ~0.1 seconds, implying 1,200 generations in two minutes, and a doubling time of two minutes because 1.0006 to the power of 1,200 is ~2. A nuclear reaction which isprompt critical is critical without the contribution of delayed neutrons. If Fermi’s pile had been prompt critical with =1.0006, neutron intensity would have doubled every tenth of a second.

The first moral is that confusing the speed of AI research with the speed of a real AI once built is like confusing the speed of physics research with the speed of nuclear reactions. It mixes up the map with the territory. It took years to get that first pile built, by a small group of physicists who didn’t generate much in the way of press releases. But, once the pile was built, interesting things happened on the timescale of nuclear interactions, not the timescale of human discourse. In the nuclear domain, elementary interactions happen much faster than human neurons fire. Much the same may be said of transistors.

Another moral is that there’s a huge difference between one self-improvement triggering 0.9994 further improvements on average, and one self-improvement triggering 1.0006 further improvements on average. The nuclear pile didn’t cross the critical threshold as the result of the physicists suddenly piling on a lot more material. The physicists piled on material slowly and steadily. Even if there is a smooth underlying curve of brain intelligence as a function of optimization pressure previously exerted on that brain, the curve of recursive self-improvement may show a huge leap.

There are also other reasons why an AI might show a sudden huge leap in intelligence. The species Homo sapiens showed a sharp jump in the effectiveness of intelligence, as the result of natural selection exerting a more-or-less steady optimization pressure on hominids for millions of years, gradually expanding the brain and prefrontal cortex, tweaking the software architecture. A few tens of thousands of years ago, hominid intelligence crossed some key threshold and made a huge leap in real-world effectiveness; we went from caves to skyscrapers in the blink of an evolutionary eye. This happened with a continuous underlying selection pressure — there wasn’t a huge jump in the optimization power of evolution when humans came along. The underlying brain architecture was also continuous — our cranial capacity didn’t suddenly increase by two orders of magnitude. So it might be that, even if the AI is being elaborated from outside by human programmers, the curve for effective intelligence will jump sharply.

Or perhaps someone builds an AI prototype that shows some promising results, and the demo attracts another $100 million in venture capital, and this money purchases a thousand times as much supercomputing power. I doubt a thousandfold increase in hardware would purchase anything like a thousandfold increase in effective intelligence — but mere doubt is not reliable in the absence of any ability to perform an analytical calculation. Compared to chimps, humans have a threefold advantage in brain and a sixfold advantage in prefrontal cortex, which suggests (a) software is more important than hardware and (b) small increases in hardware can support large improvements in software. It is one more point to consider.

Finally, AI may make an apparently sharp jump in intelligence purely as the result of anthropomorphism, the human tendency to think of «village idiot» and «Einstein» as the extreme ends of the intelligence scale, instead of nearly indistinguishable points on the scale of minds-in-general. Everything dumber than a dumb human may appear to us as simply «dumb». One imagines the «AI arrow» creeping steadily up the scale of intelligence, moving past mice and chimpanzees, with AIs still remaining «dumb» because AIs can’t speak fluent language or write science papers, and then the AI arrow crosses the tiny gap from infra-idiot to ultra-Einstein in the course of one month or some similarly short period. I don’t think this exact scenario is plausible, mostly because I don’t expect the curve of recursive self-improvement to move at a linear creep. But I am not the first to point out that «AI» is a moving target. As soon as a milestone is actually achieved, it ceases to be «AI». This can only encourage procrastination.

Let us concede for the sake of argument that, for all we know (and it seems to me also probable in the real world) that an AI has the capability to make a sudden, sharp, large leap in intelligence. What follows from this?

First and foremost: it follows that a reaction I often hear, «We don’t need to worry about Friendly AI because we don’t yet have AI», is misguided or downright suicidal. We cannot rely on having distant advance warning before AI is created; past technological revolutions usually did not telegraph themselves to people alive at the time, whatever was said afterward in hindsight. The mathematics and techniques of Friendly AI will not materialize from nowhere when needed; it takes years to lay firm foundations. And we need to solve the Friendly AI challenge before Artificial General Intelligence is created, not afterward; I shouldn’t even have to point this out. There will be difficulties for Friendly AI because the field of AI itself is in a state of low consensus and high entropy. But that doesn’t mean we don’t need to worry about Friendly AI. It means there will be difficulties. The two statements, sadly, are not remotely equivalent.

The possibility of sharp jumps in intelligence also implies a higher standard for Friendly AI techniques. The technique cannot assume the programmers’ ability to monitor the AI against its will , rewrite the AI against its will , bring to bear the threat of superior military force; nor may the algorithm assume that the programmers control a «reward button» which a smarter AI could wrest from the programmers; et cetera. Indeed no one should be making these assumptions to begin with. The indispensable protection is an AI that does not want to hurt you. Without the indispensable, no auxiliary defense can be regarded as safe. No system is secure that searches for ways to defeat its own security. If the AI would harm humanity in any context, you must be doing something wrong on a very deep level, laying your foundations awry. You are building a shotgun, pointing the shotgun at your foot, and pulling the trigger. You are deliberately setting into motion a created cognitive dynamic that will seek in some context to hurt you. That is the wrong behavior for the dynamic; write code that does something else instead.

For much the same reason, Friendly AI programmers should assume that the AI has total access to its own source code. If the AI wants to modify itself to be no longer Friendly, then Friendliness has already failed, at the point when the AI forms that intention. Any solution that relies on the AI not being able to modify itself must be broken in some way or other, and will still be broken even if the AI never does modify itself. I do not say it should be the only precaution, but the primary and indispensable precaution is that you choose into existence an AI that does not choose to hurt humanity.

To avoid the Giant Cheesecake Fallacy, we should note that the ability to self-improve does not imply the choice to do so. The successfulexercise of Friendly AI technique might create an AI which had the potential to grow more quickly, but chose instead to grow along a slower and more manageable curve. Even so, after the AI passes the criticality threshold of potential recursive self-improvement, you are then operating in a much more dangerous regime. If Friendliness fails, the AI might decide to rush full speed ahead on self-improvement — metaphorically speaking, it would go prompt critical.

I tend to assume arbitrarily large potential jumps for intelligence because (a) this is the conservative assumption; (b) it discourages proposals based on building AI without really understanding it; and (c) large potential jumps strike me as probable-in-the-real-world. If I encountered a domain where it was conservative from a risk-management perspective to assume slow improvement of the AI, then I would demand that a plan not break down catastrophically if an AI lingers at a near-human stage for years or longer. This is not a domain over which I am willing to offer narrow confidence intervals.

8: Hardware

People tend to think of large computers as the enabling factor for Artificial Intelligence. This is, to put it mildly, an extremely questionable assumption. Outside futurists discussing Artificial Intelligence talk about hardware progress because hardware progress is easy to measure — in contrast to understanding of intelligence. It is not that there has been no progress, but that the progress cannot be charted on neat PowerPoint graphs. Improvements in understanding are harder to report on, and therefore less reported.

Rather than thinking in terms of the «minimum» hardware «required» for Artificial Intelligence, think of a minimum level of researcher understanding that decreases as a function of hardware improvements. The better the computing hardware, the less understanding you need to build an AI. The extremal case is natural selection, which used a ridiculous amount of brute computational force to create human intelligence using no understanding, only nonchance retention of chance mutations.

Increased computing power makes it easier to build AI, but there is no obvious reason why increased computing power would help make the AI Friendly. Increased computing power makes it easier to use brute force; easier to combine poorly understood techniques that work. Moore’s Law steadily lowers the barrier that keeps us from building AI without a deep understanding of cognition.

It is acceptable to fail at AI and at Friendly AI. It is acceptable to succeed at AI and at Friendly AI. What is not acceptable is succeeding at AI and failing at Friendly AI. Moore’s Law makes it easier to do exactly that. «Easier», but thankfully not easy. I doubt that AI will be «easy» at the time it is finally built — simply because there are parties who will exert tremendous effort to build AI, and one of them will succeed after AI first becomes possible to build with tremendous effort.

Moore’s Law is an interaction between Friendly AI and other technologies, which adds oft-overlooked existential risk to other technologies. We can imagine that molecular nanotechnology is developed by a benign multinational governmental consortium and that they successfully avert the physical-layer dangers of nanotechnology. They straightforwardly prevent accidental replicator releases, and with much greater difficulty put global defenses in place against malicious replicators; they restrict access to «root level» nanotechnology while distributing configurable nanoblocks, et cetera. (See Phoenix and Treder, this volume.) But nonetheless nanocomputers become widely available, either because attempted restrictions are bypassed, or because no restrictions are attempted. And then someone brute-forces an Artificial Intelligence which is nonFriendly; and so the curtain is rung down. This scenario is especially worrying because incredibly powerful nanocomputers would be among the first, the easiest, and the safest-seeming applications of molecular nanotechnology.

What of regulatory controls on supercomputers? I certainly wouldn’t rely on it to prevent AI from ever being developed; yesterday’s supercomputer is tomorrow’s laptop. The standard reply to a regulatory proposal is that when nanocomputers are outlawed, only outlaws will have nanocomputers. The burden is to argue that the supposed benefits of reduced distribution outweigh the inevitable risks ofuneven distribution. For myself I would certainly not argue in favor of regulatory restrictions on the use of supercomputers for Artificial Intelligence research; it is a proposal of dubious benefit which would be fought tooth and nail by the entire AI community. But in the unlikely event that a proposal made it that far through the political process, I would not expend any significant effort on fighting it, because I don’t expect the good guys to need access to the «supercomputers» of their day. Friendly AI is not about brute-forcing the problem.

I can imagine regulations effectively controlling a small set of ultra-expensive computing resources that are presently considered«supercomputers». But computers are everywhere. It is not like the problem of nuclear proliferation, where the main emphasis is on controlling plutonium and enriched uranium. The raw materials for AI are already everywhere. That cat is so far out of the bag that it’s in your wristwatch, cellphone, and dishwasher. This too is a special and unusual factor in Artificial Intelligence as an existential risk. We are separated from the risky regime, not by large visible installations like isotope centrifuges or particle accelerators, but only by missing knowledge. To use a perhaps over-dramatic metaphor, imagine if subcritical masses of enriched uranium had powered cars and ships throughout the world, before Leo Szilard first thought of the chain reaction.

9: Threats and promises

It is a risky intellectual endeavor to predict specifically how a benevolent AI would help humanity, or an unfriendly AI harm it. There is the risk of conjunction fallacy : added detail necessarily reduces the joint probability of the entire story, but subjects often assign higher probabilities to stories which include strictly added details. (See Yudkowsky, this volume, on cognitive biases). There is the risk — virtually the certainty — of failure of imagination; and the risk of Giant Cheesecake Fallacy that leaps from capability to motive. Nonetheless I will try to solidify threats and promises.

The future has a reputation for accomplishing feats which the past thought impossible. Future civilizations have even broken what past civilizations thought (incorrectly, of course) to be the laws of physics. If prophets of 1900 AD — never mind 1000 AD — had tried to bound the powers of human civilization a billion years later, some of those impossibilities would have been accomplished before the century was out; transmuting lead into gold, for example. Because we remember future civilizations surprising past civilizations, it has become cliche that we can’t put limits on our great-grandchildren. And yet everyone in the 20th century, in the 19th century, and in the 11th century, was human.

We can distinguish three families of unreliable metaphors for imagining the capability of a smarter-than-human Artificial Intelligence:

  • G-factor metaphors: Inspired by differences of individual intelligence between humans. AIs will patent new technologies, publish groundbreaking research papers, make money on the stock market, or lead political power blocs.
  • History metaphors: Inspired by knowledge differences between past and future human civilizations. AIs will swiftly invent the kind of capabilities that cliche would attribute to human civilization a century or millennium from now: molecular nanotechnology; interstellar travel; computers performing 10 25 operations per second.
  • Species metaphors: Inspired by differences of brain architecture between species. AIs have magic.

G-factor metaphors seem most common in popular futurism: when people think of «intelligence» they think of human geniuses instead of humans. In stories about hostile AI, metaphors make for a Bostromian «good story»: an opponent that is powerful enough to create dramatic tension, but not powerful enough to instantly squash the heroes like bugs, and ultimately weak enough to lose in the final chapters of the book. Goliath against David is a «good story», but Goliath against a fruit fly is not. If we suppose the -factor metaphor, then global catastrophic risks of this scenario are relatively mild; a hostile AI is not much more of a threat than a hostile human genius. If we suppose a multiplicity of AIs, then we have a metaphor of conflict between nations, between the AI tribe and the human tribe. If the AI tribe wins in military conflict and wipes out the humans, that is an existential catastrophe of the Bang variety (Bostrom 2001). If the AI tribe dominates the world economically and attains effective control of the destiny of Earth-originating intelligent life, but the AI tribe’s goals do not seem to us interesting or worthwhile, then that is a Shriek, Whimper, or Crunch.

But how likely is it that Artificial Intelligence will cross all the vast gap from amoeba to village idiot, and then stop at the level of human genius?

The fastest observed neurons fire 1000 times per second; the fastest axon fibers conduct signals at 150 meters/second, a half-millionth the speed of light; each synaptic operation dissipates around 15,000 attojoules, which is more than a million times the thermodynamic minimum for irreversible computations at room temperature (kT 300 ln(2) = 0.003 attojoules per bit). It would be physically possible to build a brain that computed a million times as fast as a human brain, without shrinking the size, or running at lower temperatures, or invoking reversible computing or quantum computing. If a human mind were thus accelerated, a subjective year of thinking would be accomplished for every 31 physical seconds in the outside world, and a millennium would fly by in eight and a half hours. Vinge (1993) referred to such sped-up minds as «weak superintelligence»: a mind that thinks like a human but much faster.

We suppose there comes into existence an extremely fast mind, embedded in the midst of human technological civilization as it exists at that time. The failure of imagination is to say, «No matter how fast it thinks, it can only affect the world at the speed of its manipulators; it can’t operate machinery faster than it can order human hands to work; therefore a fast mind is no great threat.» It is no law of Nature that physical operations must crawl at the pace of long seconds. Critical times for elementary molecular interactions are measured in femtoseconds, sometimes picoseconds. Drexler (1992) has analyzed controllable molecular manipulators which would complete >10 6 mechanical operations per second — note that this is in keeping with the general theme of «millionfold speedup». (The smallest physically sensible increment of time is generally thought to be the Planck interval, 5�10 -44 seconds, on which scale even the dancing quarks are statues.)

Suppose that a human civilization were locked in a box and allowed to affect the outside world only through the glacially slow movement of alien tentacles, or mechanical arms that moved at microns per second. We would focus all our creativity on finding the shortest possible path to building fast manipulators in the outside world. Pondering fast manipulators, one immediately thinks of molecular nanotechnology — though there may be other ways. What is the shortest path you could take to molecular nanotechnology in the slow outside world, if you had eons to ponder each move? The answer is that I don’t know because I don’t have eons to ponder. Here’s one imaginable fast pathway: