Dual-Task Learning with Semantically-Enriched Features Enhances Analysis Performance of AP and PA Chest X-rays

Purpose To address performance degradation in automated chest X-ray (CXR) interpretation for anteroposterior (AP) views compared to posteroanterior (PA) views, we developed and evaluated deep learning (DL) models to enhance semantic understanding, robustness, and general performance. Existing approaches often suffer from label noises, ambiguous findings, and domain mismatches from pretraining non-clinical data. We hypothesized that learning deeper semantic structures could improve classification of predefined abnormalities and generalize better across views.
Materials and Methods Model 1 was a standard classification network with an ImageNet-pretrained encoder to classify five chest abnormalities: consolidation (CSN), pneumothorax (PTX), nodule or mass (NDL), fibrosis (FIB), and pleural effusion (PEF). Model 2 added an image reconstruction decoder, in comparison to model 1, to promote semantic feature learning while sharing the same classification task. Both models were fully fine-tuned with basic augmentations using 1,600,905 CXR images from three public datasets and four private hospitals (82.09% normal). External evaluation was performed using a separate private dataset (10,836 images; 35.83% normal) and the public ChestX-Det dataset (2,216 images; 24.55% normal).
Results Model 2 showed overall performance gains when evaluating with AP, PA, and collective view images improving mAUROC of 0.019 (AP: +0.0179, PA: +0.0223) in public and 0.0286 (AP: +0.0358, PA: +0.0359) in private dataset when compared to model 1. In the public dataset, the AP–PA AUROC gap was modestly reduced (Model 1: −0.0198; Model 2: −0.0158). In the private dataset, the gap remained stable (Model 1: +0.0496; Model 2: +0.0497). For individual findings, Model 2 improved view consistency for PTX (ΔAUROC: −0.1023 to −0.0211, p=0.0089) and FIB (ΔAUROC: +0.1309 to +0.0995, p=0.0002) in the public dataset. In the private dataset, CSN, FIB, and NDL showed favorable but non-significant trends. No notable differences were observed for PEF.
Conclusion Adding a reconstruction task to abnormality classification improved model performance under full fine-tuning, regardless of view. Model 2 selectively improved robustness for PTX and FIB in the AP-PA performance gaps. Learning deeper semantic features may enhance view-invariant diagnostic consistency.
Clinical Relevance/Application Semantic-aware learning strategies can improve CXR classification performance and reduce view-related gaps, enhancing reliability and fairness of AI-assisted interpretation in clinical practice.