neuroConference Abstracts
Fine-tuned Segment Anything Model for Automatic Intracranial Aneurysm Segmentation in TOF-MRA: Comparison with Non-Radiologist Physicians
저자Juyeon Yi, Dahye Lee, Honggeun Cho, Kwanseok Oh, Bumcheol Hwang, Joon Young Kim, Minjin Song, Doo Young Lee, Jin Wook Choi, Hyun S. Choi
저널RSNA 2025
- Motivation Spatial Transcriptomics (ST) links gene expression profiles to spatial locations within a tissue section, which is typically visualized through a co-registered Hematoxylin & Eosin (H&E) Whole Slide Image (WSI). Although many recent studies have highlighted ST's potential in precision medicine, especially regarding the tumor microenvironment and uncovering previously hidden intratumoral heterogeneities, its high cost, lack of standardized protocols, and limited clinically validated benefits currently hinder its clinical deployment. Therefore, recent works have leveraged state-of-the-art (SOTA) pathology Vision Foundation Models (VFM), using their embeddings to train downstream biomarker (ST) prediction models to predict gene expressions of an input H&E WSI.
- Background Current VFMs: Pretraining data: Up to millions of WSIs from 100+ tissue types; Self-supervised learning (SSL) method: DINOv2 (SOTA at time of training); Backbone architecture: Vision Transformers (ViT). Can we change the backbone architecture to improve biomarker prediction?Assumption: Biomarkers hidden in H&E WSIs are mostly low frequency!Negative, real eigenvalues have even higher bias for low frequencies!Diverse eigenvalues/cutoff frequencies = Learn diverse frequencies of input image!
- Pretraining Method & Architecture Identical dataset: 756k CRC patches. Identical SSL method: DINOv2. MV Hybrid architecture: Patch Embed, MambaVision Block, EinFFT Block, Attention Block, MLP Block. Sequence Mixing Layer, Channel Mixing Layer.
- Conclusion Compared to ViT, MV Hybrid enhances quality of VFM embeddings via analyzing frequency bias to improve CRC biomarker prediction ability and robustness. Many clinically applicable tasks like survival prediction, subtype classification, and even multimodal models all use VFM embeddings, which benefit from MV Hybrid. However, further ablation studies on each backbone and its effect on frequency filtering and scaling up model size and pretraining data is needed.