Semantic-Enriched Multi-Task Learning Enhances Chest Radiograph Analysis in Both AP and PA Views

PURPOSE: To address performance degradation in automated chest X-ray (CXR) interpretation for anteroposterior (AP) views compared to posteroanterior (PA) views, we developed and evaluated deep learning (DL) modesl to enhance semantic understanding, robustness, and general performance. Existing approaches often suffer from label noises, ambiguous findings, and domain mismatches from pretraining non-clinical data. We hypothesized that learning deeper semantic structures could improve classification of predefined abnormalities and generalize better across views.
METERIALS AND METHOD: Model 1 was a standard classification network with an ImageNet-pretrained encoder to classify five chest abnormalities: consolidation (CSN), pneumothorax (PTX), nodule or mass (NDL), fibrosis (FIB), and pleural effusion (PEF). Model 2 added an image reconstruction decoder, in comparison to model 1, to promote semantic feature learning while sharing the same classification task. Both models were fully fine-tuned with basic augmentations using 1,600,905 CXR images from three public datasets and four private hospitals (82.09% normal). External evaluation was performed using a separate private dataset (10,836 images; 35.83% normal) and the public ChestX-Det dataset (2,216 images; 24.55% normal).
RESULTS: Model 2 showed overall performance gains when evaluating with AP, PA and collective view images improving mAUROC of 0.019 (AP: +0.0179, PA: +0.0223) in public and 0.0286 (AP: +0.0358, PA: +0.0359) in private dataset when compared to model 1. In the public dataset, the AP-PA AUROC gap was modestly reduced (Model 1: -0.0198; Model 2: -0.0158). In the private dataset, the gap remained stable (Model 1: +0.0496; Model 2: +0.0497). For individual findings, Model 2 improved view consistency for PTX (ΔAUROC: -0.1023 to -0.0211, p=0.0089) and FIB (ΔAUROC: +0.1309 to +0.0995, p=0.0002) in the public dataset. In the private dataset, CSN, FIB, and NDL showed favorable but non-significant trends. No notable differences were observed for PEF.
CONCLUSION: Adding a reconstruction task to abnormality classification improved model performance under full fine-tuning, irrespective of view. The dual-task model selectively enhanced robustness for PTX and FIB, reducing AP-PA gaps. Semantic-aware learning strategies improved view-invariant consistency and may enhance the reliability and fairness of AI-assisted chest X-ray interpretation in clinical practice.