Deepfake X-Rays Fool Radiologists In New Study—AI Has Turned Hello.Medical Fraud Into A Volume Problem

Deepfake X-Rays Fool Radiologists In New Study—AI Has Turned Hello.Medical Fraud Into A Volume Problem
Photo by Umanoide / Unsplash

Generative AI has collapsed the skill required to forge a medical record. What has not collapsed is what it takes to prove one real.

A new study in Radiology, the peer-reviewed journal of the Radiological Society of North America, or RSNA, put the problem on paper. Seventeen radiologists from 12 centers in six countries were shown a set of chest X-rays. Half were real. Half were generated by AI. When the radiologists did not know synthetic images were in the mix, they flagged them as suspicious only 41 percent of the time. Told there were fakes to find, their mean accuracy climbed to 75 percent. The best reader hit 92 percent. The worst hit 58 percent. Years of experience made no measurable difference. The study used chest X-rays. he implications extend across the entire medical record.

Every File In A Medical Claim Is Now Forgeable

A chest X-ray is one file type. Medical claims run on many others, from radiology images and reports to discharge summaries, billing itemizations and the injury photos claimants upload through portals. Any of them can be generated or altered by an AI model well enough to pass a visual review.

Before generative AI, committing this kind of fraud at volume took medical coders or someone with a forger's skill and institutional letterhead. Those barriers kept the market limited. Generative AI is removing the barrier. A person with a consumer laptop and widely available AI tools can now produce files that look right to a human reviewer.

With generative AI, every corner of the medical claims pipeline becomes a potential vehicle, from radiology images to billing documents. The cost moves through the system broadly and quietly.

Probability Is Not Proof

Detection tools, including AI models trained to recognize the fingerprints of other AI models, can tell you a file is likely synthetic. They produce scores and probability bands. That information is useful at the top of the funnel, where the goal is to triage a large flow of incoming files into a reviewable queue. It does not answer whether any specific file is fake.

Proving one specific medical record is fake is a different process. Proof lives at the device level. A qualified digital forensic expert examines the file’s embedded metadata, compares the details against the scanner and facility the file claims to have come from, tests the image for artifacts left by generative models and analyzes the hospital’s system records to see whether the study was actually acquired by the equipment that supposedly produced it. In litigated or high‑severity claims, access to the hospital’s records and systems usually comes through legal discovery or a subpoena issued by counsel. 

The digital forensic expert works with what is produced. That chain of analysis is how you know. Anything short of it is a percentage.

Per-File Digital Forensics Cannot Absorb The Volume

The industry cannot run device-level forensic analysis on every claim. The number of claims is too large, the qualified expert pool is too small and the cost per exam is too high. If forensic proof is the only thing that counts as proof, and the fraud volume is about to climb sharply, a forensic-only defense does not fit the pipeline. The answer has to be layered.

The Three Layered Defense

At the top of the funnel, automated AI detection models scoring incoming files for probability of synthesis. Imperfect. It will miss clean fakes. It will false-flag clean files. It is still necessary because volume makes human-only review impossible. A University at Buffalo detector caught synthetic radiology reports at Matthews correlation coefficients between 92 and 100 in a controlled test. That kind of tool scores a file. It does not prove one.

At the middle of the funnel, humans in the loop. Adjusters, fraud-unit staff and reviewing clinicians applying judgment to what the model flagged. Slower than automation, faster and cheaper than forensic work. This is the layer that decides whether a flag deserves a closer look.

At the bottom of the funnel, for the small share of files where the claim is serious enough to stand up to litigation, a digital forensic expert, device-level analysis, admissible testimony. This is the only layer that delivers proof.

Remove a layer and the system breaks. Remove AI detection and the volume buries human reviewers. Remove human review and false flags bury honest claims. Remove forensic analysis and contested files collapse under scrutiny in court.

The Real Risk Is Forgetting What Proof Means

The industry will deploy AI detection tools. Vendors will sell them and compliance teams will approve them, and they will run at the top of the funnel in short order. The risk worth naming is that a detection score starts being treated as proof. A file denied on a model's percentage. An honest claim delayed because the model did not recognize the combination. A case settled because an automated tool returned a suspicious reading on a real patient's chart.

AI detection tells you where to look. It does not tell you what is true. True proof that a medical record is authentic comes from device-level digital forensic analysis, and it comes from nowhere else.

The deepfake X-rays in the Radiology study are a preview. The same techniques can spread across every file type in the medical claims pipeline. The system has to keep humans in the loop, and it needs real forensic capacity available when a file has to be defended in court. Probability alone will not hold up under cross-examination. The side that forgets that will learn the hard way.

Read more