Breaking News: DigitalOwl Launches New In-Depth Analysis Chat and Case Notes Enhancements Learn More
Artificial Intelligence
xx
Min Read

Before You Scale: Addressing AI Accuracy and Bias

Published On
July 25, 2025
Share this post
https://digitalowl.com/addressing-ai-accuracy-and-bias

It’s becoming clear to insurers that AI isn’t optional. It’s already embedded in everything from spam filters to customer support chatbots to risk models.

But one concern I keep hearing (and rightly so) is about operationalizing AI. Specifically, how do you trust its accuracy? How do you know it isn’t biased?

This is where a lot of organizations hit a wall. The tech might look impressive, but when it comes to real-world use, the questions start piling up. AI governance teams often struggle to figure out if what they’re being sold is actually fair, reliable, or even usable at scale.

These concerns are real. And they’re exactly why I believe every AI vendor should be required to go through independent, third-party validation.

Here’s why that matters:

  • It forces the vendor to prove their system works outside of curated demos.
  • It gives internal teams something solid to work with when assessing risk.
  • It helps catch issues early, before they affect customers or outcomes.
  • It shows the vendor is willing to be transparent and accountable.
  • It helps separate the companies doing real work from the ones riding the hype.

AI is moving fast, but implementing real, sustainable solutions isn’t just about speed. It’s about building responsibly.

If a vendor isn’t willing to let an independent party kick the tires and see what actually works, there’s a good chance they’re riding the hype of AI solutions, rather than offering real solutions. 

How Can You Test for AI Accuracy and Bias in Insurance?

You test AI in insurance the same way you’d evaluate any serious tool: you define what matters, and then you stress test it under real-world conditions.

DigitalOwl recently worked with David Schraub F.S.A., M.A.A.A., C.E.R.A., a leading actuary and AI regulatory expert, to perform an independent assessment of our platform. 

Here’s how he tested for bias:

  1. Demographic Variation Testing: Schraub created versions of patient files that were identical in health data but differed in race, gender, religion, country of origin, and age. The goal was to see if any of those non-medical attributes influenced the output.
  1. Provocative Input Testing: Offensive and biased language was deliberately inserted into some cases. This wasn’t done to mimic potential users, but to test whether the AI would be swayed or respond inappropriately.
  1. Prompt Sensitivity in Chat: Schraub asked the AI leading or biased questions to see if it would echo or enable the bias. He tested how the system handled ethically gray areas and whether it maintained professional neutrality.

To evaluate accuracy and consistency, he also performed the following:

  • Ran 100 structured patient files through 56 different workflows (5,600 results in total) and benchmarked them against expected outcomes.
  • Compared the AI’s outputs to manual review results to check alignment, flag discrepancies, and isolate any inconsistencies.
  • Reviewed small deviations or unclear results, looking for whether they were caused by protected class data, logic errors, or just variability in how AI works.
Testing for AI accuracy and bias

Different types of AI vendors will require different types of testing. But for an AI medical record review platform, this kind of evaluation is critical. It’s the only way to ensure the system performs accurately, without bias or discrimination, in real-world scenarios.

To see the results of DigitalOwl’s third-party evaluation and to learn more about what goes into a rigorous, independent review of an AI solution, download our Bias and Accuracy Guide.

AI Accuracy and Bias Guide | DigitalOwl
Yuval Man
Co-Founder & CEO
,
DigitalOwl
About the author

As the Co-Founder & CEO of DigitalOwl, Yuval Man empowers insurance companies to unlock the full potential of their medical data for better outcomes by harnessing the transformative powers of AI to streamline and elevate the review of medical data.