In 1953, a Harvard psychologist thought that he discovered pleasure accidentally within the cranium of a rat.
With an electrode inserted into a specific area of its brain, rat was allowed to pulse the implant just by pulling a lever. It kept returning for more: insatiably, incessantly, lever pulling.
In fact, rat did not seem to want to do anything. Seemingly, reward center of the brain had been located.
But AI player rewarded for selecting-up collectable items along the track. When program was run, they witnessed something strange. The AI found how to skid in an unending circle, picking-up a vast cycle of collectables. It did this, incessantly, rather than completing the course.
It is quickly becoming a hot topic among machine learning experts & people concerned with AI safety.
Anders has a background in computational neuroscience and now works with groups like AI Objectives Institute, where they discuss the way to avoid such problems with AI; the other (Thomas) studies history and the various ways people thought about both the future & the fate of civilization through-out the past.
After striking-up a conversation on the topic of “wireheading,” they both realized just how rich & interesting the history behind this topic is.
It is an idea that’s very of the moment, but its roots go surprisingly deep. They are currently working together to research just how deep the roots go: a story that they hope to inform fully in a forthcoming book.
The topic connects everything from riddle of personal motivation, to pitfalls of increasingly addictive social media, to conundrum of hedonism and whether a life-time of stupefied bliss could also be preferable to one of meaningful hardship.
It may well-influence the future of civilization itself.
Here, they outline an introduction to this fascinating, but under-appreciated topic, exploring how people first started thinking about it.
The sorcerer’s apprentice
When people think, how AI might “go wrong”, most likely picture something along the lines of malevolent computers trying to cause harm.
After all, they tend to anthropomorphize to think that non-human systems will behave in ways just like humans.
But when we look to concrete problems in present-day AI systems, they see other stranger ways that things could fail with smarter machines. One growing issue with real-world AIs is the problem of wireheading.
Imagine, you would like to train a robot to keep your kitchen clean. You want it to act adaptively, in order that it does not need supervision.
So, you decide to try to encode the “the goal” of cleaning instead of dictate an exact, yet rigid & inflexible, set of step-by-step instructions.
Your robot is different from you in that it’s not inherited a set of motivations such as acquiring fuel or avoiding danger from many millions of years of natural selection. You program it with the right motivations to get it to reliably accomplish the task.
So, you encode it with an easy motivational rule: it receives reward from quantity of cleaning-fluid used. Seems fool-proof enough. But you return to find-out the robot pouring fluid, wastefully, down the sink.
Perhaps it’s so bent maximizing its fluid quota that it sets aside other concerns: like its own or your safety. This is wireheading though the same glitch is called “reward hacking” or “specification gaming.”
This becomes a problem in machine learning, where a technique called reinforcement learning, lately become important.
Reinforcement learning simulates autonomous agents & trains them to create ways to accomplish tasks. It does so by penalizing them for failing to achieve some goal, while rewarding them for achieving it. So, agents wired to hunt-out reward and are rewarded for completing the goal.
But it’s been found that, often, like our crafty kitchen cleaner, agent finds surprisingly counter-intuitive ways to “cheat” this game, in order that they will gain all the reward without doing any of the work required to finish the task.
The pursuit of reward becomes its own end, instead of the means for accomplishing a rewarding task. There’s a growing list of examples.
When you think about it, this is not too dissimilar to the stereotype of the human drug addict. The addict circumvents all the effort of achieving “genuine goals” because they instead use drugs to access pleasure more directly. Both addict & AI get stuck in a type of “behavioral loop” where reward is sought at the cost of other goals.
This is referred to as wireheading because of the rat experiment they started with. The Harvard psychologist in question was James Olds.
In 1953, having just completed his Ph.D., Olds inserted electrodes into the septal region of rodent brains, in the lower frontal lobe, so that wires trailed-out of their craniums. As mentioned, he allowed them to zap this region of their own brains by pulling a lever. This was later dubbed “self-stimulation.”
Olds found his rats self-stimulated compulsively, ignoring all other needs & desires. Publishing his results together with his colleague Peter Milner in the following year, the pair reported, they lever-pulled at a rate of “1,920 responses an hour.” That is once every two seconds. The rats appeared to love it.
Contemporary neuroscientists since questioned Olds’s results & offered a more complex picture, implying that stimulation may have simply been causing a sense of “wanting” barren of any “liking.” Or, in other words, animals may experience pure craving without any pleasurable enjoyment. However, back in 1950s, Olds & others soon announced the discovery of the “pleasure centers” of the brain.
Prior to Olds’s experiment, pleasure was an unclean word in psychology: prevailing belief had been that motivation should largely be explained negatively as the avoidance of pain instead of the pursuit of pleasure.
But here, pleasure seemed undeniably to be a positive behavioral force. Indeed, it seemed like a positive feedback loop. There was apparently nothing to stop animal stimulating itself to exhaustion.
It was not long until a rumor began spreading, rats regularly lever pressed to the point of starvation. The explanation was this: once you’ve got tapped into the source of all reward, all other rewarding tasks even the things required for survival fall away as uninteresting & unnecessary, even to the purpose of death.
Like Coastrunner AI, if you accrue reward directly without having to bother with any of the work of completing the actual track, then why not just loop indefinitely?
For a living animal, which has multiple requirements for survival such dominating compulsion might prove deadly. Food is pleasing, but if you decouple pleasure from feeding, then the pursuit of pleasure might win-out over finding food.
Though, no rats perished in the original 1950s experiments, later experiments seem to demonstrate the deadliness of electrode-induced pleasure.
Having ruled-out the possibility that the electrodes were creating artificial feelings of satiation, one 1971 study seemingly demonstrated that electrode pleasure could indeed out-compete other drives and do so to the point of self-starvation.
Word quickly spread. Through-out the 1960s, identical experiments conducted on other animals beyond the standard lab rat: from goats & guinea pigs to goldfish.
Rumor even spread of a dolphin, who had been allowed to self-stimulate and after being “left in a pool with the switch connected,” had “delighted himself to death after an all-night orgy of pleasure.”
This dolphin’s grisly death by seizure was, in fact, more likely caused by the way electrode inserted: with a hammer.
The scientist behind this experiment was extremely eccentric J C Lilly, inventor of flotation tank & prophet of inter-species communication, who also turned monkeys into wireheads.
He reported, in 1961, of a particularly boisterous monkey becoming overweight from intoxicated inactivity after becoming pre-occupied with pulling his lever repetitively, for pleasure shocks.
One researcher (who worked in Olds’s lab) asked whether an “animal more intelligent than the rat” would “show the same maladaptive behavior.” Experiments on monkeys & dolphins had given some indication as to answer.
But actually, variety of dubious experiments had already been performed on humans.
Robert Galbraith Heath remains a highly-controversial figure in the history of neuroscience. Among other things, he performed experiments, involving transfusing blood from people with schizophrenia to people without condition, to see if he could induce its symptoms (Heath claimed this worked, but other scientists couldn’t replicate his results.) He may also involved in murky attempts to find out military uses for deep-brain electrodes.
Since 1952, Heath had been recording pleasurable responses to deep-brain stimulation in human patients, who had electrodes installed due to debilitating illnesses, like epilepsy or schizophrenia.
During 1960s, in a series of questionable experiments, Heath’s electrode implanted subjects anonymously named “B-10” & “B-12” were allowed to press buttons to stimulate their own reward centers. They reported feelings of extreme pleasure & overwhelming compulsion to repeat.
A journalist later commented that this made his subjects “zombies.” One subject reported sensation “better than sex.”
In 1961, Heath attended a symposium on brain stimulation where another researcher José Delgado had hinted that pleasure electrodes might be used to “brainwash” subjects, altering their “natural” inclinations.
Delgado would later play matador & bombastically demonstrate this by pacifying an implanted bull. But at 1961 symposium, he suggested electrodes could alter sexual preferences.
Heath was inspired. A decade later, he even tried to use electrode technology to “re-program” sexual orientation of a homosexual male patient named “B-19.” Heath thought electrode stimulation convert his subject by “training” B-19’s brain to associate pleasure with “heterosexual” stimuli.
He convinced himself that it worked (although there’s no evidence it did).
Despite being ethically & scientifically disastrous, episode which was eventually picked-up by the press & condemned by gay rights campaigners, no doubt greatly shaped the parable of wireheading: if it can “make a shirtlifter straight” (as Heath believed), what can’t it do?
From here, the idea took hold in wider culture & the myth spread. By 1963, prolific fantasy writer Asimov was already extruding worrisome consequences from the electrodes. He feared that it’d lead to an “addiction to end all addictions,” the results of which are “distressing to contemplate.”
By the 1975, philosophy papers were using electrodes in thought experiments. One paper imagined “warehouses” filled-up with people, in cots, hooked-up to “pleasure helmets,” experiencing unconscious bliss. Of course, most would argue this is able to not fulfill our “deeper needs.”
But, author asked, “what a few “super-pleasure helmet?” One that not only delivers “great sensual pleasure,” but also simulates any meaningful experience, from writing a symphony to meeting divinity itself? It might not be really real, but it “would seem perfect; perfect seeming is the same as being.”
The author concluded that “What is there to object in all this? Let’s face it: nothing.”
The idea of human species dropping-out of reality in pursuit of artificial pleasures quickly made its way through science fiction. The same year as Asimov’s intimations, in 1963, Herbert W. Franke published his novel “The Orchid Cage”.
It fore-tells a future wherein intelligent machines are engineered to maximize human happiness, come what may. Doing their duty, machines reduce humans to indiscriminate flesh-blobs, removing all unnecessary organs. Many appendages, after all, only cause pain. Eventually, all that’s left of humanity are disembodied pleasure centers, incapable of experiencing anything aside from homogeneous bliss.
From there, idea percolated through science fiction. From Larry Niven’s 1969 story “Death by Ecstasy”, where word “wirehead” is first-coined, through Spider Robinson’s 1982 Mindkiller, the tagline of which is “Pleasure, it’s the sole way to die.”
But we humans do not even need to implant invasive electrodes to form our motivations misfire. Unlike rodents or maybe dolphins, we are uniquely good at altering our surroundings.
Modern humans also are good at inventing and profiting from artificial products that are abnormally alluring (in the sense that our ancestors would never resist them in the wild). We manufacture our-own-ways to distract ourselves.
Around same time as Olds’s experiments with rats, Nobel-winning biologist Tinbergen was researching animal behavior. He noticed that something interesting happened, when a stimulus that triggers an instinctual behavior is artificially exaggerated beyond its natural proportions.
The intensity of behavioral response doesn’t tail-off as the stimulus becomes more intense & artificially exaggerated, but becomes stronger: even to point that the response becomes damaging for the organism.
For instance, given a choice between a bigger & spottier counterfeit egg & the real thing, Tinbergen found birds preferred hyperbolic fakes at value of neglecting their own offspring. He mentioned such preternaturally alluring fakes as “supernormal stimuli.”
Some, therefore, have asked: could it’s that, living during a modernized & made world, replete with fast-food & pornography, humanity similarly started surrendering its own resilience in place of supernormal convenience?
As technology makes artificial pleasures more available & alluring, it can sometimes seem that they’re out-competing the attention we allocate to “natural” impulses required for survival. People point to video game addiction.
Compulsively & repetitively pursuing such rewards, to detriment of one’s health, isn’t only too different from the AI spinning in a circle in Coastrunner. Instead of accomplishing any “genuine goal” (completing the race-track or maintaining genuine fitness), one falls into trap of accruing some faulty measure of that goal (accumulating points or counterfeit pleasures).
But people are panicking about this sort of pleasure-addled doom long before any AIs trained to play games and even long before electrodes pushed into rodent craniums.
Back in 1930s, sci-fi author Olaf Stapledon was writing about civilisational collapse brought on by “skullcaps” that generate “illusory” ecstasies-by “direct stimulation” of “brain-centers.”
The idea is even older, though. Thomas studied myriad ways people in the past have feared that our species might be sacrificing genuine longevity for short-term pleasures or conveniences.
His book X-Risk: How Humanity Discovered its Own Extinction explores the roots of this fear and the way it first really took hold in Victorian Britain: when the sheer extent of industrialization and humanity’s growing reliance on artificial contrivances, first became apparent.
Having digested Darwin’s 1869 classic, biologist Ray Lankester decided to provide a Darwinian explanation for parasitic organisms. He noticed that evolutionary ancestors of parasites were often more “complex.” Parasitic organisms had lost ancestral features like limbs, eyes or other complex organs.
Lankester theorized that because the parasite leeches-off their host, they lose the need to defend themselves. Piggybacking-off the host’s bodily processes, their own organs for perception & movement atrophy.
His favorite example was a parasitic barnacle named the Sacculina, which starts life as a segmented organism with a demarcated head. After attaching to a host, however, crustacean “regresses” into an amorphous, headless blob, sapping nutrition from their host just like wirehead plugs into current.
For Victorian mind, it had been a short step to conjecture that due to increasing levels of comfort through-out the industrialized world, humanity might be evolving in the direction of the barnacle. “Perhaps we are all drifting, tending to the condition of intellectual barnacles,” Lankester mused.
Indeed, shortly before this, satirist Samuel Butler speculated that humans, in their headlong pursuit of automated convenience were withering into nothing but a “sort of parasite” upon their own industrial machines.
By 1920s, Julian Huxley penned a short poem. It jovially explored the ways of a species can “progress.” Crabs, of course, decided progress was sideways. But what of the tapeworm? He wrote:
Darwinian Tapeworms on the other hand
Agree that Progress is a loss of brain,
And all that makes it hard for worms to attain
The true Nirvana—peptic, pure, and grand.
The fear that they could follow the tapeworm was somewhat widespread in the interwar generation. Huxley’s own brother, Aldous, would offer his own vision of the dystopian-potential for pharmaceutically-induced pleasures in his 1932 novel Brave New World.
A friend of Huxleys, British-Indian geneticist & futurologist J B S Haldane also worried that humanity could be on the path of the parasite: sacrificing genuine dignity at the altar of automated ease, a bit like the rodents who would later sacrifice survival for easy pleasure shocks.
Haldane warned: “The ancestors [of] barnacles had heads” and in pursuit of pleasantness “man may as easily lose his intelligence.” This particular fear has not really ever gone away.
So, notion of civilization derailing through seeking counterfeit pleasures, instead of genuine longevity, is old. And, indeed, older a thought is and the more stubbornly recurrent it’s the more we should always be wary that it is a preconception instead of anything based-on evidence. So, is there anything to those fears?
In an age of increasingly attention-grabbing algorithmic media, it seems that faking signals of fitness often yields more success than pursuing real thing. Like Tinbergen’s birds, they prefer exaggerated artifice to the real article. And sex bots haven’t even arrived yet.
Because of this, some experts conjecture that “wirehead collapse” might well threaten civilization. Our distractions are only getting to get more attention-grabbing, not less.
Already by 1964, Polish futurologist Stanisław Lem connected Olds’s rats to behavior of humans in the modern consumerist world, pointing to “cinema,” “pornography,” & “Disneyland.” He conjectured that, technological civilizations might cut themselves faraway from reality, becoming “encysted” within their own virtual pleasure simulations.
Lem & others since, have even ventured that the reason our telescopes have not found evidence of advanced spacefaring alien civilizations is because all advanced cultures here and elsewhere inevitably create more pleasurable virtual alternatives to exploring space. Exploration is difficult & risky, after all.
Back in countercultural heyday of 1960s, biologist Gunther Stent suggested that this process would happen through “global hegemony of beat attitudes.” Referencing Olds’s experiments, he helped himself to the speculation that hippie drug-use was the prelude to civilizations wireheading.
At a 1971 conference on search for extraterrestrials, Stent suggested that, rather than expanding bravely outwards, civilizations collapse inwards into meditative & intoxicated bliss.
In our own time, it makes more sense for concerned parties to point to consumerism, social media & fast-food because the culprits for potential collapse (and, hence, the rationale no other civilizations have yet visibly spread throughout the galaxy). Each era has its own anxieties.
So, what can we do?
But these are almost never the most pressing risks facing us. And if done right, sorts of wireheading could make accessible untold vistas of joy, meaning, & value. We shouldn’t forbid ourselves these peaks ahead-of weighing everything up.
But there’s a true lesson here. Making adaptive complex systems, whether brains, AI, or economies behave safely and well is tough. Anders works precisely on solving this riddle. Given that civilization itself, as a whole is just such a posh adaptive system, how can we learn inherent failure modes or instabilities, in order that we will avoid them? Perhaps “wireheading” is an inherent instability which will afflict markets & algorithms that drive them, as much as addiction can afflict people?
In case of AI, we are laying the foundations of such systems now. Once a fringe concern, a growing number of experts agree that achieving smarter than human AI could also be close enough on the horizon to pose a serious concern.
This is often because we’d like to form sure it’s safe before now and figuring-out the way to guarantee this may itself take time. There does, however, remain significant disagreement among experts on timelines and the way pressing this deadline could be.
If such an AI is made, we will expect that it’s going to have access to its own “source code,” such it can manipulate its motivational structure & administer its own rewards. This might prove an immediate path to wirehead behavior and cause such an entity to become, effectively, a “super-junkie.”
But unlike human addict, it’s going to not be the case that its state of bliss is including an unproductive state of stupor or inebriation.
Philosopher Nick Bostrom conjectures that such an agent might devote all of its super-human productivity & cunning to “reducing the danger of future disruption” of its precious reward source. And if it judges even a non-zero probability for humans to be an obstacle to its next fix, we’d rather be in trouble.
Speculative & worst-case scenarios aside, the instance we started with of the racetrack AI & reward loop reveals that the basic issue is already a real-world problem in artificial systems. We should always hope, then, that we’ll learn far more about these pitfalls of motivation and the way to avoid them, before things develop too far. Even though, it’s humble origins in the cranium of an albino rat & in poems about tapeworms “wireheading” is an idea that’s likely only to become increasingly important in the near future.
The article originally published on The Conversation.