Retrospective Comparison between a Report-Generating AI Model and a SegmentationBased AI Model in Detecting Pleural Effusion on Chest X-rays

Purpose This study evaluated two AI models—DCXR, a label-based model, and M4CXR (multi-modal for chest Xray), a report-generating model—for detecting pleural effusion on chest X-rays (CXR). M4CXR is a visionlanguage model trained from image-text pairs without segmentation labels, potentially reducing annotation efforts. Conversely, DCXR was a label-based conventional model[1][2]. The study aims to assess the accuracy of these two models and to compare their performance.
Methods & Materials A total of 123 anonymized chest X-ray images were included from Inha University Hospital, South Korea (Jan–Apr 2024). This dataset included 71 cases with pleural effusion and 52 normal cases, as determined by radiologist reports. Key performance metrics such as sensitivity, specificity, and AUC with 95% confidence intervals (CIs) were calculated. McNemar's test was set with a significance level of 0.05. Analyses were conducted for all cases combined (ALL) and separately for posteroanterior (PA) and anteroposterior (AP) views. Additionally, attention maps from M4CXR were analyzed and compared with DCXR's highlighted regions.
Results M4CXR demonstrated superior performance compared to DCXR in detecting pleural effusion. In the ALL dataset, M4CXR achieved an AUC of 0.835 (95% CI: 0.772–0.897) compared to DCXR's 0.624 (95% CI: 0.537–0.711). Similar trends were observed in PA and AP views, with M4CXR consistently outperforming DCXR in sensitivity, specificity, and AUC (Table 1).