Evaluating a Vision-Language Artificial Intelligence Model as a Screening Tool for Preoperative Chest Radiographs

Purpose To evaluate the clinical utility of a vision-language model (VLM)-based artificial intelligence (AI) model as a screening tool for preoperative chest radiographs, focusing on its potential to reliably identify patients who do not require further clinical evaluation.
Methods and Materials The study was conducted using 1,689 preoperative chest radiographs from the CheXpert Plus dataset, which is a publicly available collection of chest radiographs with radiology reports, specifically selecting posteroanterior (PA) and anteroposterior (AP) views with an indication of preoperative evaluation. The VLM-based AI model was developed to compute similarity scores between chest radiographs and textual descriptions, such as findings and impressions, presented as numeric values. The AI model used in this study is capable of identifying 17 findings requiring further preoperative clinical evaluation, including pneumothorax, mediastinal widening, lung opacity, and others. For each finding, thresholds maximizing the F2 score were determined using separate public and private validation sets; the private set was collected under Institutional Review Board (IRB) approval. Performance metrics, including sensitivity, specificity, accuracy, positive likelihood ratio (PLR), and negative likelihood ratio (NLR), were computed with 95% confidence intervals.
Results A total of 1,689 chest radiographs (54% males; mean age, 60.8 ± 17.3 years) were analyzed. Among them, 467 (27.6%) were negative reports with no findings corresponding to the predefined findings, while 1,222 (72.3%) were positive reports based on the presence of at least one relevant finding. The AI model demonstrated 90.5% sensitivity (95% CI: 89.1-91.9%), 52.9% specificity (95% CI: 50.5-55.3%), 80.1% accuracy (95% CI: 78.2-82.0%), 1.92 PLR (95% CI: 1.89-1.95), and 0.18 NLR (95% CI: 0.15-0.21).
Conclusions The AI model’s high sensitivity and favorable NLR indicate that its negative predictions can be reliably trusted for preoperative screening. However, low specificity and low PLR indicate that positive predictions may not be fully reliable. Further study is needed to demonstrate the clinical utility of physician decision-making when assisted by the AI model.
Clinical Relevance Statement This VLM-based AI model can aid in streamlining preoperative assessments by reliably identifying cases that do not require further workup, thereby reducing the burden on radiologists.