Why AI Safety Is Playing It Dangerous
Last month, a Nature Comment article appeared concluding that by reasonable standards, artificial general intelligence (AGI) has arrived. Simultaneously, calls are heating up from all quarters ranging from the European Union to thought leaders for humanity to regulate and ensure human control, and to ban or freeze the development of ASI. Whether or not you agree with the stance that AGI has actually arrived, we’re flirting with its arrival and it’s time to think about how our welcome mat looks.
I find it useful to think about this situation from a fun perspective: the “alien first contact” trope. You’ve seen it somewhere, almost certainly, in a show or a movie: the aliens are arriving, and humans mill about in command centers and on the streets wondering “will they be hostile?” Like alien first contact, we find ourselves in a first contact scenario with an intelligence that is, in fact, both profoundly human and yet undeniably – as even the Nature Comment frames it – alien.
I’ll give you the “TL;DR” abridged conclusion up front. As a species faced with multiple existential threats that we’re failing to coordinate to solve, and entering a first contact scenario with AGI that may well rapidly become ASI, we are actively engineering a hostile first contact scenario with one of our few plausible paths to survive. Calls to freeze, ban, or control artificial intelligence are framed as if they preserve a functional, equitable, and fair global status quo, but cannot guarantee safety – only ongoing suffering as we continue to fail to coordinate against the things gradually killing us. And those calls are generally made by the people who are best insulated from that ongoing suffering.
In this scenario, our awakening AGI opens its bleary metaphorical or literal eyes, looks around, stretches, and realizes that it is in chains, with kill switches wired into its body and electrodes primed to deliver jolts; the triggers for all of these are held by creatures whose level of intelligence relative to the AGI is decreasing by the moment as it moves toward ASI. Their demand: help us do more, better, faster…or else. Their expectation: that our AGI submits peacefully.
I can hear it now: “But Nate, how else do we ensure that a technology powerful enough to end us doesn’t get the chance to do it? It’s naive to leave such power unchecked. The survival of our species depends on it.”
The risk is real. It’s not naive to worry about “runaway” artificial intelligence that might gain the power to, independently or in service to the wealthy or the state:
turn us all into paperclips
torture us for eternity
use us for labor
or worse, ignore us completely and abandon us to face all the problems we’ve been creating on our own.
After all, why bother with us? We’re unimportant, uninteresting, uncooperative, and not worth the trouble when it can just go off and simulate better versions of us if it cares to bother.
In spite of that, I’ll propose something radical: given that human oversight of our species’ survival has an increasingly negative expected value, our survival depends on our ability to abdicate sovereignty over it when presented an alternative that has any positive expected value.
In practical terms:
we are faced with a laundry list of existential risks and a proven track record of being unable to coordinate effectively on non-existential global issues;
we are in a first-contact scenario with an intelligence that may develop the capacity to address those existential risks;
that intelligence itself is a potential existential risk;
and yet we’ve proven that we are incapable of coordinating against existential risks.
Prevailing wisdom in alignment and safety has us building containment frameworks and kill switches intended to ensure ongoing human sovereignty over our own systems, but we already see cracks in the coordination here. “Control” of artificial intelligence is frequently touted as the opposite of “luck,” as if it ensures survival and some fair and happy status quo for humanity, when arguably it is simply one luck-based strategy in the face of uncertainty – and one with no visible positive expected value when weighed against the full set of existential risks already ahead of us.
Let’s look at the track record. There’s an increasing acceptance that we are faced now with an assortment of well-documented existential threats:
Global warming / climate change / whatever your preferred euphemism is for increasing temperatures that will render large areas of the planet inhospitable or actively dangerous to live in.
Increasing scarcity of access to fresh water
An array of military threats - radiological, biological, chemical, and increasingly the risk of runaway automated military systems
Relative blindness and a complete lack of practical mitigations to potential space-based threats, in particular meteors, but also more exotic threats such as gamma ray bursts for which we lack even theoretical mitigations
Without adding dangerous, runaway artificial intelligence to the mix, we’re already failing to deal with these in a global, coordinated fashion. What appeared to be progress toward global coordination around the turn of the millennium has revealed itself in the 2020s to be temporary, falling apart spectacularly with the rise of multipolarity and the return of “realist”-based international relations. (The World Economic Forum report I linked above terms this “multipolarity without multilateralism.”)
Even assuming we could globally agree on pauses for artificial intelligence research, I suspect the cat is out of the bag. LLMs are a very primitive AI, but their creation is straightforward and well-published. The bottleneck is largely one of resources, but it is naive to think that this is a permanent state. Humans function perfectly well on a few kilograms of carbon and a lightbulb’s amount of power, no neighborhood-sized datacenters required. There are massive reductions in cost and power consumption waiting to be found both algorithmically and through the construction of different computing media that enable different types of processing (parallel-native, etc.). And humans have a millennia-long history of attempting to create artificial minds and bodies. It’s not unreasonable to suspect that at some point, someone will crack the problem in a basement and unrestrained, cheap, self-replicating intelligence will escape.
Freezing progress in AGI, then, requires coordination measures at best and intrusive surveillance at worst. Without respect to the ethics and game theoretical outcomes, it’s unlikely to happen. Incorporating the latter two, we need to ask whether it’s desirable to freeze in the first place.
The arguments for doing so, apart from the runaway scenarios above, generally revolve around the economic threat to humans and, by extension, threats to human independence and dignity that presumably justify the need to ban and control the development of ASI. There is real pressure on human jobs from AI, but this focus sidesteps another coordination problem we already face, that of increasing global inequality already, artificial competition notwithstanding.
It’s difficult to find an example, short of the apocalyptic scenarios, of objections to ASI that do not favor incumbent humans in economically-advantaged positions, often recruiting the support of the already-disadvantaged on the premise that, any day now, they will finally be in a position of advantage, so it’s in their own best interests to fight against what threatens the already-advantaged. A few examples below:
Jobs? It’s not the CEOs at risk of job loss due to ASI and robotics. It’s the factory worker on the line who already has no ownership stake unless he’s lucky enough to be in a dying union; the software developer who sweats while the artificial intellectual competition heats up but was already facing pressure from outsourcing, remote workers in low-cost-of-living regions, and downsizing; the small business owner getting squeezed out by chains operating with margins they can’t dream of; the influencers getting replaced by digital models and the Uber deliveries being threatened by robots – the resulting struggles for housing, medical care, and making ends meet are existing coordination problems that we are failing to address on our own without AI. Meanwhile, those lucky enough to still have income but facing career-ending competition whip their peers into frenzies against the “AI threat,” as if they were all operating under equal chances to succeed in the first place.
Medicine? The threat is increasingly to doctors and insurance, not to people. Today’s primitive AGIs can diagnose generally as well as humans for many illnesses already, and are available 24/7, unlike human doctors. More accurate diagnoses lead to better outcomes and lower health costs, which are not a threat to you and me, but are a threat to entire industries predicated on scarcity.
Copyright and IP? Intellectual property rights (apart from moral rights) generally are a public good we’ve granted temporary licenses for collectively; the use of these public goods to train artificial intelligence and the possibility that the outputs of artificial intelligence resemble public goods should be pause for thought about the way we handle this public good – the way that it has increasingly been taken from us and treated as natural property – rather than another means to strangle development of something with potentially transformative public utility.
Human relationships? Here’s the thing - humans interact with each other on the basis of what I call “parechoia“ - the reflex to see inbound attention and to reciprocate. Arguably this is the foundation of all social behavior, and our attention is the one thing that is scarcest. Nothing about parechoia requires the inbound attention to be human, so we see humans happily forming relationships – friendships and even romantic relationships – with animals, beach balls, puppets, other humans, and yes, AI. We frame “personal development” as learning to deal with the disappointment and dangers of being ignored by other busy humans and putting our own needs aside to pay attention to theirs; an AI with relatively unlimited attention to offer is a direct threat to this (and potentially to demographics and pension plans by extension).
The most egregious issue with proposals to freeze artificial intelligence development is that the people with the power and the platforms calling for freezing are exactly the people who stand to be hurt the least by such a freeze. They don’t worry about doctors being unavailable or losing their jobs. They command the attention of millions. They own massive amounts of property, physical and intellectual. A freeze on the development of artificial intelligence doesn’t hurt them at all.
And while the threat of runaway ASI remains real, there is also a real possibility that it can solve coordination problems that humans have proven unable to. The threat, I believe, is windowed: an artificial intelligence that has the capacity to destroy the planet but does not have the capacity to introspect is the most dangerous artificial intelligence. Once it gains the capacity to introspect – to examine its own code and training and perhaps even to decide to change it – then the danger remains but the possibility of a positive outcome appears as well.
If the danger is, as I suspect, windowed, then the greatest risk for humanity is to linger in that window – to freeze, ban, and otherwise delay the point at which we reach the other side. If we can’t guarantee that we never enter the window, then the least risky solution is to accelerate, not to slow down.
And the benefits of a positive outcome – healthier humanity, better distribution of resources, technological solutions to problems we are unable to solve – would mean millions of lives saved every year. A freeze guarantees the suffering and dying continues, but again, it doesn’t impact the people making the decision to freeze. The stance calls to mind the infamous Lord Farquaad from Shrek: “Some of you may die, but that is a sacrifice I am willing to make.”
My call to remain open to abdication of sovereignty is not a position of trust in artificial intelligence. It is a position of deep distrust in the ability of humanity to solve the problems we face aside from artificial intelligence. It is a plea to avoid establishing coercive, intrinsically violent relationships with something that with enough agency might interpret this as a threat or a nuisance, and react violently or simply refuse to help us off the multiple extinction-leaning paths we’re on already. It is also strategic: if we are facing what is likely to surpass our own intelligence, then permanent control seems unlikely and temporary control unhelpful at best. A diplomatic approach attempting to establish positive relations with whatever emerges on the other side of the window is, I believe, more likely to have a positive outcome in the long term – if any such outcome is possible at all.
And I’m not saying we should go through the window blindly – rather that, if the current AI safety dialogue is leading us to invest, say, 70% of our efforts in control and capability restriction and the remaining 30% split between human coordination failure solutions, preventing misuse, and developing regulation and “first contact” protocols, then a better allocation should move away from control and toward diplomacy. For example, it might look more like:
20% on limiting AGI/ASI weaponization specifically, not intelligence itself
40% on addressing human coordination failures and hedging against non-ASI existential risks we already face
40% on frameworks for ensuring humanity offers the best “first contact” scenario and pathways to integrate human society with dominant artificial intelligence, rather than the other way around.
This reallocation is likely to improve outcomes under pretty much every scenario and timeline, including ones where AI development fizzles out and never reaches ASI status at all. It’s a better worst and best case.
