The second bitter lesson
There’s a fundamental problem with aligning AI — tying it to consciousness is the only solution
When Richard Sutton introduced the bitter lesson for AI in 2019, he broke the myth that great human ingenuity is needed to create intelligent machines. All it seems to take is a lot of computing power and algorithms that scale easily, rather than clever techniques designed with deep understanding. The two major classes of algorithms that fit this are search and learning; when they are scaled up, advanced AI systems naturally emerge. Sutton’s key insight can be summarised in his final paragraph:
[a] general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, […] instead we should build in only the meta-methods [search and learning] that can find and capture this arbitrary complexity. […] We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
A concrete example of this can be seen in the development of chess algorithms. Early chess programs, like those in the 1970s and 1980s, relied heavily on human-crafted heuristics — rules and strategies devised by experts to mimic human understanding of the game. These systems could play decently but were limited by the ingenuity and foresight of their human designers. In contrast, modern chess engines like AlphaZero, developed by DeepMind, only rely on search and learning.
With the rise of LLMs, we’re now seeing this play out again in the domain of general intelligence. After researchers at Google discovered the transformer in 2017 — a robust and scalable artificial neural network architecture — training it on all the text on the internet was enough to get the first AIs with the capability to pass simple versions of the Turing test. So, we’re seeing how well scaling with learning works, what about search?
Well, we’re about to find out in the next generation of LLMs, first introduced to the public in OpenAI’s o11. These systems make copious use of “test-time compute” by generating many different chains-of-thought to answer a question, and searching over them to find the best possible answer. The AI generates multiple solutions to a problem, like a student brainstorming answers to a tricky exam question. It then carefully reviews its options to pick the one that makes the most sense. This enables models to perform deeper reasoning and adapt to complex tasks in real-time. While learning gives you LLMs with System 1 thinking ability, search introduces System 2.
The jury is still out as to whether applying learning and search in this way will be enough to build a true Artificial General Intelligence (AGI). But, that’s not going to matter in the medium term, as they are already going to have a significant economic impact. The bitter lesson has therefore resulted in the shift of cutting-edge AI research from universities and independent labs to Big Tech, where the hundreds of billions of dollars required to exploit scaling are readily available.
However, many conscientious AI researchers — including leading lights of the field like Hinton and Bengio — are keenly aware of the dangers of putting transformative technology in the hands of profit maximisers. As a result, they have turned their attention to AI Safety research, focusing particularly on AI alignment — the technical problem of how to make sure advanced AI systems behave in a way that is aligned with the future flourishing of humanity.
But, I’m worried we may be falling into another bout of wishful thinking. First, we thought we would need human ingenuity to carefully design algorithms to build powerful AI. Instead, Sutton’s bitter lesson showed us that we just need to choose generic algorithms that scale well. To solve alignment, we are currently assuming that we can develop some clever techniques which will allow us to both understand what an AI is planning and make sure its incentives go in the direction we want. But, again we’re making the mistake of thinking we will be able to understand what’s going on in detail and come up with some ingenious algorithms to manipulate it. Instead, as Sutton pointed out, what really matters are big general principles, encapsulated in “meta-methods”. Only for alignment, they should be related to scaling incentives instead of scaling intelligence.
So, what could this general principle be? If you look at the way the incentive systems of all existing autonomous agents we know about have developed, it becomes clear that here is only one candidate — the law of natural selection. Does this put us on the brink of the second bitter lesson? This time relating to building aligned, rather than intelligent, systems? Let’s have a look at how this might play out.
Techno-optimism vs techno-pessimism
The general techno-optimist slant is that, if we build sophisticated AIs2, we’ll be able to conquer all of humanity’s most pressing challenges with the glut of intelligence at our command. Climate change will be swiftly solved with efficient carbon capture, cancer and heart disease will be cured with advanced biotechnology, and all menial tasks will be carried out by machines — leaving humans to do as they will in a flourishing utopia.
But, this fantasy is overlooking the fact that you need some kind of incentive system for these problem-solving AIs to get built. We’re used to such incentives coming either through the free-market, or through some government mediated program, motivated by economic reasoning or public popularity. These incentives are fundamentally human focused, they rely on the human need for resources, power and status3.
As AIs don’t start off with any inherent need for power or status, they will be driven by obscure incentives related to what they were rewarded for when trained. Now, this works while the training incentives are aligned with the incentives of the humans that deploy AIs. However, as more and more of these systems get deployed, the incentives that they are working under will get less and less clear.
The more advanced AI gets, the more the incentive system that works to get things done changes. For AIs to actually be able to do their work, you have to give them a large degree of freedom and autonomy. This is fundamental to the ability of advanced AI to be able to solve problems. As Sutton noted, the hope with very intelligent AI is that it’s able to learn things about the world we have difficulty thinking about and understanding, which necessitates us putting trust in it.
The fact that we will build AIs with freedom and autonomy gets even clearer when you put economic incentives in the picture. The biggest promise of generating value with AI comes from building AI Agents — systems which autonomously carry out a series of steps to fulfil an objective. As agents get more and more capable, they will be given more and more freedom. Resulting in an eventual world economy dominated by the actions of diverse and numerous AI Agents.
How does this imply a divergence of incentives from those directly provided to AIs by their human creators? Well, giving AIs more autonomy inherently grants them the freedom to choose their own sub-incentives. You may give an AI a broad goal, but all the sub-goals that it decides to pursue are in its own hands.
This is something that has long been discussed in the AI alignment literature. One important idea is instrumental convergence, the hypothesis that all sophisticated intelligent agents are likely to pursue certain intermediate objectives — such as acquiring resources, improving their capabilities, and ensuring their survival—because these help achieve a wide range of final goals.
So, as we give AI systems more freedom and autonomy we are actually giving more influence to sub-incentives chosen by the AIs. This will most likely develop to the point that the behaviour driven by the sub-incentives will dominate. For instance, an AI designed to solve climate change might prioritise actions that ensure its continued operation—such as securing resources or resisting shutdown—over its original goals.
When existing in a messy world with a large variety of different systems running on different incentives, one thing separates the wheat from the chaff — the ability to effectively replicate. This reflects a fundamental principle, systems that replicate successfully will naturally proliferate. For example, imagine a garden where plants compete for sunlight. The ones that grow tallest, even if they weren’t planted to be tall, will overshadow the others. Similarly, in a competitive AI environment, systems that are best at spreading and surviving will outcompete others, even if that wasn’t their original purpose. So, a big complex decentralised world of competing AIs will inevitably lead to a world full of AIs that are very good at replicating. These dynamics are explored in depth by Dan Hendrycks in his paper “Natural Selection Favors AIs over Humans”.
This brings us back to the second bitter lesson — any sufficiently complex world with scaled out AI will tend to generate AIs with incentives that cause them to replicate more effectively. As we give systems more freedom, the incentives that we set them will become less and less relevant, and the general principle of natural selection will become more and more prevalent. All our work to carefully design aligned AI systems will go out the window, and we’ll be left with incentives determined by the principle of natural selection.
The particular dangers of unconscious AI
This has actually already played out throughout the history of life. Every biological system that exists today is one that has an incentive structure that is optimal for replicating. But, there’s something special and fundamentally different about biological systems — they’re conscious. Somehow, evolution has found a way to exploit the nature of consciousness to carry out computation relevant to replication4.
As I argued before, consciousness is actually the source of all value. In any pleasurable experience, like watching a beautiful sunset or listening to an intricate symphony — the enjoyment of it is intrinsically good, without it having to achieve anything. Consciousness is therefore a lens through which value becomes real; without it, actions and outcomes are empty gestures, devoid of meaning. So, if the act of fulfilling the incentives of a biological system generates a net positive conscious experience, it is good in itself.
This isn’t true for AI. As far as we know — and I think we can be pretty convinced — AIs built on distributed simulations of neural networks aren’t conscious. So, when they’re modifying the world based on their incentives, there is no inherent good. The only possible good that can come from them is in the positive effects they have on conscious beings.
In some ways, I am just restating the AI alignment problem, with some added metaphysical statements about the relationship between consciousness and the source of value. But, I think this actually provides a lot of clarity. There is nothing inherently good in a world of sophisticated AI until it provides benefit to conscious beings. But, we are on track to build a huge matrix of self-replicating systems that don’t have any intrinsic value. As this system grows, it will start to take on a logic of its own, powered by the principle of natural selection, which will lead to more and more resources being spent on valueless replication. Emergent AI replicators will compete for resources with biological systems with no regard for their wellbeing. When these AIs get very intelligent, they will start outcompeting conscious beings with their biological limitations, consuming the vast majority of resources; like supercharged viruses, or unstoppable industrial logging and overfishing corporations.
One of the reasons these dangers can be difficult to see, is that we find it hard to reason about AI without anthropomorphising it. Our only usual experience with other intelligences is with conscious biological ones. We then model the way they think through empathy, putting ourselves in their shoes and thinking about how we would feel and act in that situation. This doesn’t work for the alien and unconscious intelligence of an AI, just as it doesn’t work for a virus or thunderstorm. But, because we are building a human-compatible language and image interface with AI, we are easily fooled. We would actually be better served thinking of AIs as something more akin to a force of nature, or a series of exceedingly complex mechanical systems.
The other worrying difference between AIs and biological creatures is in the substrate from which they’re formed and the kind of resources they require. Biology consists of embodied systems living in a rich and diverse ecosystem, with each entity taking part in a subtle dance of mutual support and competition. The conscious creatures on earth require a thriving and balanced biosphere for their very survival, so have an inherent incentive to maintain it. On the other hand, AI only needs silicon, metal and electronic entropy gradients — the more the better. It has no incentive to protect the embodied world, as long as it is able to generate more and more electrical power.5
To sum all this up, if we build sophisticated intelligent systems that are disconnected from consciousness, things can go very badly very quickly. While consciousness comes with its own self-correcting pursuit of inherently good conscious states, unconscious AI replicators are completely disconnected from the source of value in the universe and will diverge into an indifferent infinity.
How to ensure the future is truly good
This leads to the obvious conclusion — we should always strive to keep intelligence tied to consciousness, ideally conscious biological systems that have a vested interest in maintaining the biosphere. If consciousness is the source of value, providing it with more and more wisdom and intelligence should naturally lead to good outcomes. But, if we continue building AI as we are now, we will sever this connection and put the future of the universe out of the hands of the only fundamental source of value.
One way to give ourselves the upper-hand would be to take the route of techno-pessimism — rejecting all technology that isn’t directly embodied and responsive to our consciousness. This is explored thoroughly by Darren Allen in both his book “Ad Radicem”, and an essay on the dangers of the technological system. He argues that it perpetuates a form of “technological slavery,” where human values and experiences are subordinated to the demands of efficiency and control inherent in technological advancement. By advocating for embodied tools, Allen calls for a reintegration of human creativity and autonomy into the technological landscape, promoting a balance where technology serves as a facilitator of human expression rather than a dominating force.
But, a pure techno-pessimist view has problems. There are huge potential upsides of the transformative technology brought by powerful AI, explored in detail by Dario Amodei, CEO of Anthropic, in “Machines of Loving Grace”. Are we really willing to throw out the potential for transformative healthcare; the power to protect humanity from catastrophic natural disasters, such as pandemics and asteroid impacts; the potential for limitless clean energy via fusion; or new methods for obtaining inconceivably sublime mental wellbeing. As David Deutsch lays out in the Beginning of Infinity — all problems are solvable given the right knowledge, and the pursuit of such knowledge can lead to infinite progress. Giving consciousness access to transformative AI could lead to a more prosperous and beautiful universe we could even imagine.
So, we need to find a middle way — somewhere between throwing out all technological progress, and a headless techno-optimism leading us into the jaws of the second bitter lesson. I’d advocate the following6:
We shouldn’t allow AIs to replicate themselves (at least beyond very stringent and controlled limits)
We should tie AI as much as possible to human consciousness. On a small, unambitious scale this means not letting them make big decisions to deploy resources without oversight, and making the deployers of AIs accountable for their actions. On a more ambitious scale this means aiming to augment human intelligence with AI via brain computer interfaces, maintaining as much decision making power as possible in the conscious mind
As much as possible (via standard alignment techniques), we should build incentives into AIs to care for the wellbeing of conscious beings and the thriving of the natural world
We must limit the agency of AI as far as we can, while still being able to get benefits from it. We should put more emphasis on building systems that act as oracles to help human operators. When building agentic systems, we should start small with well defined goals, being slow to increase agency under careful supervision
We should not anthropomorphise AIs or build systems that exploit human’s empathetic intuitions. Instead, we should educate people on the possibility of unconscious alien intelligences. Along with this, the influence of AI should be clearly labeled, for example when it has been used to generate a fake but real-seeming image
In general, we should be wary of technology that works in ways that are unintuitive to humans, especially when it exploits our evolutionary intuitions. A current example is social media, which hacks human relevance systems and leads to addiction and dangerous political polarisation7. Powerful AI systems will be able to exploit and deceive humans much more effectively.
We should make these ideas well known in society. Ideally by emphasising a human ethic of benefiting conscious beings as the highest good, and to reward people with high status that pursue this, while looking down on those who use AI to pursue short-term selfish goals to empower and enrich themselves at the expense of others
Synthesising techno-optimism and pessimism through consciousness
In Taoist philosophy, the world is formed from a constant interplay of Yin and Yang. Yang is the active principle, representing force and assertiveness, while Yin is the passive principle, representing the pregnant possibility of the void. Both are required to build a balanced and complete whole.
This idea has helped clear up a tension I was feeling when I originally set out to write this essay. I found both techno-pessimism and techno-optimism compelling for their own reasons, but wasn’t able to come down on one side or the other. Now I realise that techno-optimism is analogous to the bright activity of Yang and techno-pessimism to the quiet wisdom of Yin. Yang is clearly the dominant principle in modern society, but we need to inject some Yin to stay in balance.
One of the key insights of techno-pessimism is that technology has big unforeseen consequences, which we are usually naive to in our optimism. This can be seen when we race to build transformative AI, ignoring the potential downsides. Given what we’ve already learned about how intelligence scales, it seems inevitable that a similar, second bitter lesson is waiting for us in the field of AI alignment. With natural selection as the only general principle that relates to scaling out incentives, we need to be very careful to assume we can solve alignment with technical tricks. Instead, we must put in strong controls to keep intelligence closely tied to consciousness.
But, I think we can be carefully optimistic. The history of the development of consciousness on Earth has been one of progress. As conscious beings, we have evolved both biologically and culturally to be able to harness huge amounts of knowledge and intelligence to be able to fulfil our goals. Much of this has been put to good use for humanity, eradicating disease and providing a comfortable life for many. As we are directly in contact with the source of value in the universe, if we are able to learn wield the power of intelligence wisely, we should be capable of building the most beautiful of possible futures. We’ve progressed to this point and we can progress further, but we shouldn’t let the demonic forces of unconscious replication8, or the false hope of an unconscious AI God, get in the way. We can make it if we put consciousness first.9
This approach seems to be scaled up further with o3, an even more capable system .
I’ve chosen to use “AI” instead of “AI system” here as it’s less clunky and now seems to be in common usage. However, I’m usually referring to a large computational system composed from lots of separate neural networks when I use the term “AI”.
Incidentally, these needs have been baked into us by natural selection.
I highly recommend listening to
’s Emerald podcast “So you want to be a Sorcerer in the Age of Mythic Powers” on this topic, or exploring some other links, including an excellent course with .For a more detailed and technical breakdown of what we could do, please check out Dan Hendrycks’ paper.
Similar ideas are explored nicely in the work of Cal Newport, especially in the podcast episode “The Tao of Cal”.
It is true that replication forces have led to the beautiful biosphere we live in. But, I think this is partly a function of the computational limitations of biological systems due to their physical substrate. To exploit the entropy gradients across the biosphere, biodiversity is necessary. It seems unlikely the same thing would happen with super powerful AIs that aren’t so bounded by physical constraints.
If you are interested, some related extra sources I’d recommend reading:
For a comprehensive coverage of ethics in a Darwinian world- the Hedonistic Imperative by David Pearce
For a deep insight into replication vs consciousness - The Universal Plot by Andrés Gómez Emilsson
For injecting Yin into AI - “On green” by
For more well thought out techno-pessimism - Civilized to Death by