Aligning Our Systems to Human Values

We demand values alignment from artificial intelligences. Why not our other systems?

Aug 19, 2025

Systems, values, and alignment

A few months ago, I was having a chat with an artificial intelligence safety leader about, among other things, AI alignment. There are far better sources than me to explain what AI alignment is, but as a summary it’s activities around ensuring that artificial intelligence systems behave in a way that is consistent with human values and intended goals. I consider it one of the most important areas of applied AI work given how rapidly the technology is moving. Long before it became a critical 21^st century emerging field, we started exploring the core dilemma: how to build systems with power that do not betray the values of those they serve, even when they obey the rules.

My interests flowed from a combination of professional and personal curiosity. In my formative years I was influenced by characters like Astro Boy and Data, archetypes of artificial beings whose application of values-driven logic gave them superhuman moral judgement alongside their great strength. On the flip side we had Skynet and HAL, cautionary tales of what happens when we hand judgment to misaligned entities whose intelligence surpasses our own. In my work as an intelligence professional I’ve been using LLMs and the precursor technologies for years, particularly for processing and analysing bulk unstructured data to produce insights for decision makers. But I’ve also watched the threats emerge as nefarious actors embrace artificial intelligence to launch cyber campaigns, produce disinformation, and accelerate technologies which threaten our societies.

The conversation flowed over how to embed principles of AI safety into my work and future potential career paths. We also wrestled with the acknowledged challenge of deciding whose values make the cut when doing the aligning. But towards the end of conversation, I made an offhand remark that I haven’t been able to shake since:

Why are we so focussed on making sure that AI is aligned to human values when we haven’t insisted on the same for all the other systems in our lives?

We’ve demanded that artificial intelligence be aligned with human values because we recognise their potential to shape lives, exert power, and act autonomously within systems. But we already live among powerful entities that meet this description. Governments, corporations, and bureaucracies shape our reality every day. Yet we’ve been content to hold institutions to a far lower standard: compliance with the law. In the absence of formal alignment to human values they have drifted to an extent that has led to people questioning their legitimacy. It is a quiet betrayal of the social contract that we all feel but haven’t been able to name.

A system aligning its values. Generated by ChatGPT.

Before we can talk about alignment, it helps to understand how we even see systems in the first place. Most of us encounter them first at the level of events: an unfair decision, a broken process, a form that doesn’t make sense. When we zoom out, patterns emerge. Their signals are repeated failures, slow responses, disjointed decision making, an inability to reform, or entire sectors stuck in place. Further out still are the structures: the rules, incentives, and institutional logics that produce those patterns. Beyond those are the mental models and worldviews that shape how those structures are even imagined. This is where the deeper misalignments hide.

In this series, I’m going to move between these levels. The aim isn’t just to explain the systems that frustrate us, it’s to show how values get lost at each layer, and how AI might help us stitch them back in.

The cascading effect of this realisation was that it contextualised a lot of what I feel about the world that I live in today. My society here in Australia is pretty good, for the most part, and I have a positive view of my fellow citizens. I’m safe and healthy and free to get through my days in peace. And when they work well, the systems and organisations that form a critical part of our society make a massively positive contribution to our wellbeing by performing functions that no individual alone could hope to achieve. Still, something is fundamentally wrong with the fabric of our society, with the systems that exist all around us.

The institutions that we have created are functionally goal-directed intelligences of a type. Their behaviour emerges from internal logic, external pressures, the human agents within them, a survival instinct, and systemic drift over time. We’ve engineered these systems with power and autonomy so that they can fulfil these critical roles in our society. What we haven’t done is explicitly embed our values within them. It means that these organisations are misaligned emergent intelligences, and unlike AIs, we’ve given them no safety layer.

These entities are all around us; they are big and powerful because they need to be. It means that when they take an action, or when individuals take part in an action on their behalf, they have the potential to generate both enormous good and significant harms. I like to believe that on balance their positive impact on our lives greatly outweighs the negatives. Of course, when these systems damage individuals, it can be personally devastating.

I’m a veteran of futile foreign wars. I’ve spent a career chasing rule breakers on behalf of grateful communities. More recently I’ve transformed into a corporate drone, applying my trade to help protect critical private services that we all depend on. I’ve worked inside of massively complex organisations, mostly in roles where my job has been to run functions that are looking out for threats to their core interests. The best insights I’ve had into organisations, government and corporate, is what they worry about during critical incidents. I kind of get these alien intelligences in my own way, from both the inside and the outside, but ultimately, they’re not like us and never will be.

Despite their difference, we should hold them to high standards. So, in this first post, I’m going to introduce a simultaneously radical and uncontroversial precept:

We should expect the same adherence to core values from non-human agents as we do from people.

We’ve engineered systems with power and autonomy but never stopped to ask whether they’re aligned with the values of the people they serve. We’re asking questions about alignment now because we’re on the brink of creating artificial intelligences that we recognise as being like us. My concern is that we’ve had other intelligences living alongside us all this time without asking the same of them.

Naming the failures

I’ve been harmed by systems. I suspect most of the readers have as well, even if the harms are abstract and the offenders diffuse. I’ve also inflicted harm on people on behalf of institutions. Always legally, almost always in line with community values, but harmful all the same. Once again, I suspect that many people have, through the demands of economic competition, by generating environmental damage, through bureaucratic decisions they regret but have no choice but to make, and by taking urgent action to make communities safe and secure. We necessarily compartmentalise these actions as being part of the job or just how things are. But sometimes it gnaws at us.

I’ve seen good, honest people undertake actions on behalf of organisations that they would never inflict on another person when acting in their personal lives. The paper shield of bureaucracy, made from the pages of law books and stacked layers of process documentation, transforms moral individuals into agents of institutional indifference. It’s so normalised that we barely give it any thought. The problem is that sometimes when we do we realise that these entities we belong to, that we sometimes identify with, don’t share our values.

Beneath even these layers of values, systems, and laws lies something deeper: worldview. Among other things, it’s the lens through which we decide what even counts as a value in the first place. We don’t have to cover that now (it’s a conversation for later in this series) but it’s worth noting that every debate about alignment sits inside a larger frame shaped by how we see the world.

Understanding that worldview comes first matters, because the way we see the world shapes which values we choose, how we express them, how we embed them, and how we adapt them over time. I call these features the four As: authorship, articulation, alignment, and adaptation. This alliterative mnemonic is useful to highlight how values should interact with organisations and how they operate.

Authorship

First to authorship. Authorship refers to who decides the values of a society, who is excluded, and the mechanisms they use to do so. Across history, societies have used many approaches: traditional wisdom handed down by elders, religious revelation codified into doctrine, philosophical debate among learned scholars, and democratic processes that produce constitutions and rights frameworks. The choice of mechanism depends on a community’s shared worldview, and while the specific path may vary, legitimacy rests on whether the community accepts both the process and its outcomes.

In my cultural context, we’ve largely rejected mechanisms such as religious doctrine to codify the core values of society but haven’t established new ways to define them. Instead, we try to embed values inferred from cultural narratives into laws and processes. This kind of works until you need to explain shared values to an alien intelligence like the AI you’ve just created or a corporate wealth accumulation system. Pluralism is essential to preserving freedom, but without a deliberate process for authorship, even our most fundamental principles remain informally held and inconsistently applied.

Articulation

The second is articulation. Articulation refers to the explicit expression of the values that individuals and systems aim to be aligned to. Organisations clearly have stated values, and the best organisations live by them. But there are unstated values, some assumed, others contested, which are the essential foundations of our cultures, nations, and civilisations. They’re often difficult for even we humans to label and explain, but unlike the systems around us we have instincts and complex fuzzy neurological structures that tend to keep us on the right track.

Systems don’t, and I believe that the absence of value-aware feedback loops is an oversight that we need to correct. But the more important absence is that of clearly articulated values which have been agreed upon by a constituency. This makes alignment to values impossible even if we can implement systemic mechanisms to produce useful signals. We need these clearly articulated values not for us humans, we get it instinctively and through cultural transmission. Instead, we need them for the other intelligences that we have in our lives so that they can align themselves with our values.

Alignment

Next is alignment. We’ve already discussed this in the context of artificial intelligence, and it can just as easily be applied to systems more broadly. It’s making sure that these systems behave in a way that is consistent with human values and intended goals.

A common frustration which illustrates moral alignment versus legal compliance is terms of service agreements. Users are often required to agree to these agreements to access a service, and they are notorious for being complex and full of legalese. It’s become a cliché that nobody ever actually reads them. My intent in this blog series is to look at the mechanisms for embedding values in systems rather than trying to promote any specific values of my own. However, for the sake of this example, we’ll assume that consent, transparency, agency, and empathy are reasonable values which we are asserting.

Terms of service agreements are misaligned when assessed against these values. They create the illusion of consent because they are a compulsory condition of use. Consent requires understanding and alternatives, but there is often no alternative if someone wants to use a type of product. They are also functionally opaque rather than transparent. The information is presented, but often it’s not easily comprehensible. There are problems with agency because there is no negotiation between agents. Finally, there is a disregard for empathy. They are designed to be intentionally exploitative of cognitive bandwidth, time, and attention rather than engaging with people as moral agents. They represent a failure of alignment.

Adaptation

Finally, there’s adaptation. For many in my generation, this has become the background hum of despair. We’ve inherited values that are no longer controversial: a responsibility to the future, a custodial approach towards nature, a general trust in science as a method to understand the world. But the systems that we live under can’t, or won’t, adapt in a way to reflect those values. The result is a moral dissonance where we say one thing, feel another, and act through structures which betray both. It’s not a failure of capacity or knowledge or planning, it’s a failure of alignment, legitimacy and moral courage.

Part of the problem is that we’ve outsourced moral responsibility to economic proxies. Money becomes a stand-in for values. Things like carbon credit instead of emissions cuts, ethical investments scores instead of structural reforms, philanthropy instead of justice. But these instruments don’t help to adapt to changes in values. Instead, they allow institutions to signal alignment without any adaptation to our expectations. The result is a kind of moral laundering, where symbolic compliance replaces real adaptation. It’s efficient, measurable, and deeply misaligned.

They’re not just four separate failures, but stages in a chain: who writes the values, how we express them, whether we embed them, and how we adapt when the world changes.

There is hope

I’m an optimist. A walk, a coffee, some time to reflect, that’s all it takes to recontextualise the challenges we’re facing. I believe, at a fundamental level, that we have a unique opportunity opening in front of us, perhaps the most significant we’ve had in generations.

The emergence of artificial intelligence has triggered immediate and serious conversations about values and their relationship with complex systems. Our concerns have moved beyond the motives of the speculative fiction antagonists from my youth to something that needs to be addressed in the next few years. They have become urgent and real. But in looking closely at AI alignment we’ve also stumbled into a broader reckoning: we’re finally asking how our other systems should behave.

That’s the first half of the opportunity. The second half is that artificial intelligence is destined to become our systems by absorbing and transforming the bureaucracies, institutions, and platforms that already structure our lives. It will compress decision and sense making across these organisations. This in turn will moderate gatekeeping, perverse internal incentives, destructive feedback loops, concentrated self-interest, and internal contradictions. It’s at that point that values can be embedded back into our systems.

AI will amplify whatever moral infrastructure we give it. The reason why AI alignment has been recognised as being so critical is that we all know that this infrastructure is in urgent need of repair. We are in the process of handing over the control of misaligned systems to intelligences which will likely exceed our own by the end of the decade. And we don’t have the tools to diagnose this broad misalignment, much less to fix it.

That’s my intent for this series: to share ideas about building the moral infrastructure that can help us to repair systems which have drifted from their original intents. My aim is to blend some of the early frameworks that I’ve been building with practical AI tooling, anecdotes about systems and their failures, and broad reflections on culture, values, and worldviews. It’s informed by academia but written from the perspective of lived experience. I hope to navigate this space through narrative, myth, and emotion as well as analysis. We need to, because the outcome of this work needs to be suitable for the whole of our lived human experiences.

It's inevitable that at least some of what I’m going to write is going to be wrong. I’m putting out imperfect material into a complex world, knowing that my own view is incomplete and that much of what I’ve done has been produced in isolation. That means that this is also an invitation to start a conversation about these big picture ideas from an angle that might be new.

But it’s also likely that some of it is going to be useful. And at this time, when tools are changing faster than our ethics can follow, a good enough map now might be more important than a perfect one drawn too late.

Systems of Value

Discussion about this post