Why Forensic Science Needs to Protect Human Intelligence in the Age of Artificial Intelligence

Keeping Humans in the Loop is Not Enough

Author: Niki Osborne, PhD, The New Zealand Institute for Public Health and Forensic Science

Share this article

Image generated with the assistance of AI

The temptation to use Artificial Intelligence (AI) to brainstorm and write this article is strong. With so much else on my to-do list, why not enlist the help of something that already far exceeds me at the speed at which it can produce content? Dare I say, possibly even better. But if I outsource to AI, I miss the opportunity to sit down and really articulate my thoughts about the role these tools may play in shaping the future of forensic science.

I first used ChatGPT in 2022 to write funny poems about my friends and pets. Then I used it more like a Google search on steroids. Now, it is my personal trainer, health and wellness coach, business mentor, time manager, accountant, writing assistant, and work colleague. It helps me work through governance frameworks, presentations, executive summaries, data analysis, research ideas, and almost any other task I need to make progress on.

Honestly, it is hard for me to imagine life and work without an AI assistant.

I am experiencing firsthand, however, the personal trade-off in using this transformative technology. What I am noticing is not just convenience, but cognitive offloading:¹ handing over more and the structuring, drafting, summarizing, and sense-making than I intended. That can be useful, but it also changes what I practice, and what I may slowly stop practicing.

At times, it feels like my human intelligence is paying the price. The more productive I become by using AI, the less I feel that the work is truly mine. Even more unsettling, the easier it becomes to justify the cognitive shortcuts I notice myself taking. This is an uncomfortable truth to sit with as a cognitive bias and human factors researcher.

That does not mean the work is worse. In many cases, it is probably clearer, faster, and more useful. But something important has shifted in how I perceive the relationship between effort, expertise, authorship, judgement, and intelligence itself.

And in forensic science that shift matters far beyond my own thinking and writing process.

When AI enters the forensic workflow

Forensic science is not just another professional setting adopting a new technology. It is a high-consequence environment where decisions and opinions about forensic evidence can influence investigations, charging decisions, plea negotiations, trial strategy, appeals, and ultimately people’s lives. When AI enters this environment, we need to go beyond asking whether the technology works. We need to ask what role it plays in the forensic process, what decisions it may influence, and whether its outputs can be understood, challenged, and explained.

The National Institute of Standards and Technology (NIST) AI Risk Management Framework² emphasizes that trustworthy AI depends on characteristics such as validity, reliability, transparency, explainability, privacy, fairness, and accountability, all considered in context. Understanding these characteristics in the context of how an AI tool is being used is critical because AI risk does not sit in the tool alone.

AI might be used to summarize case information, triage exhibits, assist with image or pattern comparison, interpret complex data, draft reports, search literature, support quality review, or help communicate findings. The context for each of these use cases is different. A tool that summarizes administrative correspondence does not carry the same risk profile as a tool that prioritizes evidence for testing, interprets a complex DNA mixture, or shapes the wording of an expert opinion.

This is why the phrase “AI in forensic science” is too broad to be useful on its own. We need to ask: What exactly is the tool doing? Who is using it? What data does it rely on? What output does it produce? And critically, what decisions might it influence?

These questions move the conversation beyond whether AI can help us work faster and better. Of course it can. The harder question is what role AI is actually playing in the formation of forensic decisions and opinions.

Being “in the loop” is not enough

Much of the conversation about responsible AI comes back to the idea of keeping a “human in the loop.” I agree with that, but I also think we need to be more honest about what it means.

It is not enough for a human to be present somewhere in the process. A scientist might technically approve an AI output without fully understanding it. A reviewer might be influenced by a polished AI-generated summary. A manager might assume that because a tool is sophisticated, its outputs are more reliable than they really are. A laboratory might adopt a system because it appears to work well enough, without fully understanding when it fails.

In those situations, human oversight can become more symbolic than meaningful.

The NIST Expert Working Group Series on Human Factors in Forensic Science (DNA³, Latent Print Analysis⁴, and Handwriting Examination⁵) describe forensic disciplines as systems shaped by interactions between people, tools, information, procedures, training, management, quality management, and working environments. That systems view becomes even more important as AI tools enter forensic workflows. AI is not just a tool sitting outside the human process. It becomes part of the process that shapes what people notice, trust, question, document, and explain.

Robust quality management practices already provide a useful foundation: documentation, validation, technical review, training, competency assessment, audit, corrective action, and ongoing performance monitoring. The challenge with AI is making sure those practices are adapted to the specific risks AI introduces, including model updates, data drift, automation bias, explainability, disclosure, and the possibility that human review becomes less meaningful over time.

The real question is not whether the human is in the loop. It is whether the human retains the expertise to understand the loop and meaningfully review the inputs and outputs. Can they explain what the tool did? Can they recognize when it might be wrong? Can they challenge the output? Can they describe how it influenced their own thinking? Can they take responsibility for the final opinion?

The illusion of excellence

One of the risks I worry about most is not dramatic AI failure. It is AI that appears to work well enough, while quietly eroding our ability to critically review it and weakening the experience that makes us experts in the first place.

The easier something is to read and process, the more likely we may be to experience it as clear, credible, and complete. This phenomenon is called cognitive fluency⁶, and it is a hallmark of AI outputs. They have a polished feeling, even if the reasoning underneath is incomplete, uncertain, or wrong.

Indeed, the more often something appears to be right, the easier it becomes to stop looking closely. That is where complacency can creep in: not because people are careless, but because the tool has trained them to expect competence.

In forensic science, that should make us pause.

Our work is not valuable because it sounds confident. It is valuable because it is transparent, tested, appropriately limited, and grounded in science. Our justice system requires that expert forensic opinions must be capable of scrutiny. That is, it must be possible to understand what was done and by whom, why it was done, what assumptions were made, and what limitations exist.

AI does not remove that responsibility. If anything, it increases it.

What do we want AI to change?

I am not anti-AI. Quite the opposite. I use multiple different AI tools every day, and I am already seeing enormous potential within our laboratory. AI can help reduce cognitive burden, improve access to information, reduce backlogs, support training, assist with quality processes, help identify patterns across data, and free scientists to spend more time on the work that most needs human expertise.

But we should be deliberate about what we are trying to preserve.

The goal should not be to keep humans involved simply because humans have always been involved. Some tasks will absolutely be improved through the support, simplification, or automation that AI can provide. But other human tasks are foundational: defining the question, understanding the limits of the data, recognizing uncertainty, challenging assumptions, exercising ethical judgement, and explaining reasoning in a way that courts can understand and test.

Those are the forms of human intelligence forensic science cannot afford to lose.

So perhaps the question for every forensic laboratory is not only, “What can AI help us do?” It is also: “What must humans still be able to do well?”

The future of AI in forensic science should not be defined by whether humans are technically still in the loop. It should be defined by whether humans remain capable of doing what justice requires: understanding the evidence, questioning the tools, explaining the reasoning, and taking responsibility for the opinions produced.

That is a much higher standard than simply adopting AI with a human somewhere in the process. It requires laboratories to have practical, explicit conversations about where AI belongs, how it should be governed, and what kind of human expertise must be protected along the way.

Four conversations every forensic laboratory should be having

The first conversation is about language. Right now, one person says “AI-enabled,” another hears “black box,” and a third assumes the system is doing far more than it really is. That confusion helps nobody. Laboratories need a shared way to describe AI tools: what they do, what data they use, who they are designed for, what outputs they produce, how much human oversight they require, whether they have been validated, and why AI is being used in the first place.

The second conversation is about meaningful oversight. Human-in-the-loop cannot simply mean that a person signs off at the end. It must mean that people have the capability, authority, time, training, and information needed to challenge the tool. If the human cannot explain how the tool influenced the work, oversight may not be doing what we think it is doing.

The third conversation is about procurement. Forensic organizations are already facing decisions about whether to buy, build, or trial AI tools. Those decisions cannot be a leap of faith. Before adopting a tool, organizations need to ask hard questions about validation, data governance, explainability, legal admissibility, disclosure, operational fit, and what happens when the human disagrees with the system. These questions should not be seen as barriers to innovation. They are what make innovation trustworthy.

The fourth conversation is about how to turn good intentions into practice. Most people agree that AI should be transparent, reliable, accurate, and subject to appropriate human oversight. The hard part is working out what those principles actually mean when a forensic organization is considering a real tool for a real workflow. This is where the Responsible Artificial Intelligence Framework for Forensic Science⁷ is useful. It gives forensic science organizations a practical way to document proposed AI projects, think through risks and benefits, and consider what safeguards are needed before AI becomes part of operational work. In other words, it helps move the conversation from “we support responsible AI” to “this is how we are going to make responsible AI happen.”

Share your thoughts and experience

I am currently interviewing and surveying members of the forensic science community to better understand AI use cases, adoption strategies, perceptions of human impact, and questions of legal admissibility. I want to understand not only what tools people are using, but what those tools are changing: in workflows, in expertise, in confidence, in accountability, and in the relationship between forensic science and the justice system.

The forensic science community has an opportunity to shape AI adoption before the most consequential use cases become normalized. This is the moment to ask difficult questions, not after tools have already become part of routine casework.

Once AI is built into triage, interpretation, reporting, quality review, or testimony, it may start to feel inevitable. The danger is that convenience becomes practice, practice becomes policy, and policy becomes difficult to unwind. We should not waste this opportunity to shape AI adoption before those patterns become embedded.

If you are using, developing, considering, resisting, or simply wondering about AI in forensic science, I would value hearing your perspective. To join the conversation, please email me: niki.osborne@phfscience.nz.

References

Risko EF, Gilbert SJ. Cognitive Offloading. Trends in Cognitive Sciences. 2016; 20(9):676-688. doi:10.1016/j.tics.2016.07.002.
National Institute of Standards and Technology. Artificial Intelligence Risk Management Framework (AI RMF 1.0). 2023. NIST AI 100-1. doi: 10.6028/NIST.AI.100-1.
Expert Working Group on Human Factors in Forensic DNA Interpretation. Forensic DNA Interpretation and Human Factors: Improving Practice Through a Systems Approach. National Institute of Standards and Technology (NIST). 2024. NIST IR 8503. doi: 10.6028/NISTIR.8503.
Expert Working Group on Human Factors in Latent Print Analysis. Latent Print Examination and Human Factors: Improving the Practice through a Systems Approach. NIST IR 7842. National Institute of Standards and Technology; 2012. doi:10.6028/NIST.IR.7842
Expert Working Group for Human Factors in Handwriting Examination. Forensic Handwriting Examination and Human Factors: Improving the Practice through a Systems Approach. NIST IR 8282r1. National Institute of Standards and Technology; 2021. doi:10.6028/NIST.IR.8282r1
Reber R, Schwarz N. Effects of Perceptual Fluency on Judgments of Truth. Consciousness and Cognition. 1999; 8(3):338-342. doi:https://doi.org/10.1006/ccog.1999.0386.
Stacey J, Fleming R, Sheppard D, Sheppard J, Dobbie G, Karunakaran D. A Responsible Artificial Intelligence Framework for Forensic Science. Forensic Science International. 2025; 375(112548):1-10. doi:10.1016/j.forsciint.2025.112548.