The quiet failure of AI in government
The promise and the reality of algorithmic government
Across the world, governments are adopting artificial intelligence systems to manage welfare, policing, taxation, and other core administrative tasks. The promise is familiar: faster processing, lower costs, and decisions that are more consistent and less biased than human judgement. An algorithm that can process thousands of claims a day or flag potential fraud cases appears to offer a cleaner, more objective public sector.
What is emerging in practice is more complicated. Rather than solving entrenched social problems, many deployments of AI are entrenching them in new, less visible ways. Systems presented as neutral are amplifying longstanding inequalities; models sold as accurate are built on unreliable data; tools marketed as “smart” are deployed without basic evaluation. The issue is not a handful of technical glitches. It is a pattern of design and governance choices that undermines fairness and accountability.
The sections below outline five recurring problems in how AI is used in public administration and why they matter for anyone interested in digital governance.
Automation scales bias instead of removing it
A central claim about AI in government is that mathematical models can remove prejudices from decision-making. Algorithms are presented as neutral mechanisms that apply the same rules to everyone.
In practice, models trained on historical data learn the patterns and biases embedded in that data. The well-known Amazon recruitment case illustrates the mechanism: a hiring model trained on CVs from a male-dominated workforce learned to penalise terms associated with women and downgraded applicants from women’s colleges. Nothing in the code explicitly instructed it to do so; the pattern was inherited from the past.
Public-sector systems face the same dynamic. When historical records reflect discriminatory policing, unequal access to services, or biased enforcement, models built on that data will reproduce those patterns. The result is not the removal of bias but its codification into rules that appear objective and are applied consistently and at scale. Where individual bias is at least visible and contestable, algorithmic bias can be harder to detect and easier to defend as “just what the data says.”
Government models are trained on “dirty data”
AI systems depend on the quality of the data they are trained on. In many public-sector contexts, that data is compromised in ways that go far beyond occasional errors. Researchers describe this as “dirty data”: datasets that embed the effects of unlawful practices, organisational incentives, and systematic recording problems.
Examples include:
Manipulated statistics: crime numbers altered to meet performance targets or political expectations.
Unlawful enforcement practices: records generated by unconstitutional stops, discriminatory checks, or other illegal actions.
Large-scale recording errors: serious offences misclassified as minor ones, missing entries, and inconsistent coding.
If these records form the basis for predictive policing, risk scoring, or resource allocation, the model learns a distorted picture of reality. Instead of forecasting future harm or need, it learns where institutions have chosen to act in the past. The system then presents those inherited distortions as neutral predictions.
For governance, this is a basic epistemic problem: when the input is structured by institutional failure, there is no technical fix that can make the output objective.
Predictive systems can lock institutions into self-fulfilling loops
Dirty data becomes more damaging when it is fed back into the system through repeated use. Predictive tools in areas like policing or welfare risk scoring can create self-reinforcing cycles.
A standard pattern in predictive policing looks like this:
A model is trained on historical arrest data that reflects over-policing in particular neighbourhoods.
The model flags those neighbourhoods as high risk.
Police are deployed there more frequently.
Increased presence produces more recorded offences, especially for minor infractions.
New data confirms the model’s original assessment, justifying further deployments.
Over time, the system becomes very good at predicting where police will make arrests, not where crime is most prevalent. The model’s apparent accuracy is a product of its own influence on institutional behaviour.
This matters for AI governance because it shifts the role of models from tools that inform decisions to mechanisms that stabilise particular practices. It becomes harder for agencies to change course, even when communities bear the cost of intensified scrutiny.
People are judged through weak proxies and arbitrary signals
Many algorithmic systems in government assess concepts that are hard to measure directly, such as “risk of fraud,” “likelihood of recidivism,” or “need for care.” Developers therefore reach for proxies: data points that are easier to record and quantify.
When proxies are poorly chosen, they encode arbitrary or unjust distinctions. The example of a system using possession of a trailer, a diesel car, or a wheelie bin as indicators of suspicious behaviour shows how far this can drift from any meaningful concept of risk.
In health, one widely discussed case involved a tool that used healthcare expenditure as a proxy for healthcare need. Because marginalised groups have historically received less care and funding, their costs were lower. The model inferred that they had lower need and recommended fewer resources for them. Underinvestment in care was translated into an apparently neutral signal of reduced risk.
These design choices matter because they determine who receives attention, scrutiny, or support. People are judged not on their circumstances or actions but on correlations that may have little to do with the underlying policy goals. Once embedded in software, these proxies are hard to contest and are often shielded by claims of commercial confidentiality.
Procurement is driving adoption faster than oversight can keep up
The rapid spread of AI in public administration is not primarily driven by internal technical capacity. It is driven by procurement. Vendors offer ready-made solutions to agencies under pressure to modernise and to demonstrate that they are “data-driven.” The result is a market where ambitious claims often outrun evidence.
Public administrations frequently face a skills gap: limited internal expertise in machine learning, data governance, or model evaluation. Vendors, by contrast, specialise in presenting their products as powerful, safe, and innovative. This creates an asymmetry of expertise in which governments are strongly reliant on the seller’s own assurances.
Recent work by the Netherlands Court of Audit gives a concrete sense of how this plays out. It found that many agencies did not systematically assess the risks of the AI systems they were using and often did not know whether those systems worked as intended. In other words, tools that influence who is investigated, who is paid, or who is flagged as a risk can operate for extended periods without basic performance or impact evaluation.
For democratic governance, this raises a direct accountability question: when neither the public nor the administrators can say how a system behaves or whether it is meeting its objectives, on what basis is its continued use justified?
Conclusion: governing systems, not just deploying them
The main risks of AI in government do not lie in speculative superintelligence. They emerge from very ordinary factors: biased and unreliable data, weak proxies, self-reinforcing feedback loops, and procurement processes that prioritise adoption over evaluation.
These are governance problems. They cannot be solved by adding another layer of technical sophistication. They require decisions about where automation is appropriate, which tasks demand inherently interpretable systems, what data is acceptable to use, and how oversight is enforced.
If AI is to have a legitimate role in public administration, it needs to be embedded in institutions that can explain, contest, and, when necessary, switch off the systems they use. Automation for its own sake is not a public good. Automated decisions only serve the public interest when the underlying models, data, and incentives are subject to the same scrutiny we expect of any other exercise of state power.


