Exploring Vision-Language AI-Assisted Double Reading for Error Detection in Chest Radiograph Reports Using Synthetic Errors

Purpose To explore the feasibility of using a vision-language model (VLM) as an AI-assisted double reading system to detect discrepancies between chest radiographs and corresponding radiology reports.
Materials and Methods We developed an AI system utilizing a VLM that computes the similarity between chest radiographs and radiology report sentences. The system analyzes input chest X-rays and associated reports through three methods: (1) identifying semantically inconsistent sentences using image-text similarity scoring; (2) flagging false-positive (FP) findings when the image is classified as normal but the report mentions abnormality; and (3) identifying false-negative (FN) cases where findings detected by the model are not described in the report.
To assess performance, we conducted experiments on 500 randomly selected cases from the publicly available MIMIC-CXR test split. Synthetic errors were introduced in three ways: (a) reversing horizontal directional terms in reports by swapping "left" and "right" (side-swap); (b) injecting two randomly selected abnormal findings into normal reports (FP-injection); and (c) randomly removing one or two sentences describing abnormal findings from reports containing at least one abnormality (FN-injection).
Results The AI system detected discrepancies in 52.0% of original reports, where side-swap increased the error detection rate to 65.0%. For FP-injection and FN-injection, the error detection rate increased from 49.1% to 89.1% and from 52.4% to 74.8%, respectively. For FP-injection, the inconsistency detection and FP-detection rates increased from 16.4% to 80.0% and from 7.3% to 49.1%, respectively. For FN-injection, FN-detection increased from 33.0% to 63.6%. An aggregate analysis of 10 targeted abnormalities showed that the system identified 69.1% of the inserted FPs and detected 28.5% of the removed FNs.
Conclusion The vision-language model–based system showed potential as an AI-assisted double reading tool by identifying report-image discrepancies introduced through synthetic errors, particularly false-positive and directional changes. Detection of omitted findings was moderate, indicating areas for improvement. Further development and validation on real-world errors are needed to support safe and effective integration of AI into radiology double reading workflows.
Clinical Relevance / Application AI-assisted double reading with VLMs may enhance report accuracy and safety while offering scalable support for radiologists in routine chest imaging workflows.