Scientific program

Orals

Monday 26

1 — Spectral embeddings

Session chair: C-F Westin

175 - ToothForge: Automatic Dental Shape Generation using Synchronized Spectral Embeddings Tibor Kubík, François Guibault, Michal Španěl, Hervé Lombaert , Study group leader: Suyash Awate
- abstract
We introduce ToothForge, a spectral approach for automatically generating novel 3D teeth, effectively addressing the sparsity of dental shape datasets. By operating in the spectral domain, our method enables compact machine learning modeling, allowing the generation of high-resolution tooth meshes in milliseconds. However, generating shape spectra comes with the instability of the decomposed harmonics. To address this, we propose modeling the latent manifold on synchronized frequential embeddings. Spectra of all data samples are aligned to a common basis prior to the training procedure, effectively eliminating biases introduced by the decomposition instability. Furthermore, synchronized modeling removes the limiting factor imposed by previous methods, which require all shapes to share a common fixed connectivity. Using a private dataset of real dental crowns, we observe a greater reconstruction quality of the synthetized shapes, exceeding those of models trained on unaligned embeddings. We also explore additional applications of spectral analysis in digital dentistry, such as shape compression and interpolation. ToothForge facilitates a range of approaches at the intersection of spectral analysis and machine learning, with fewer restrictions on mesh structure. This makes it applicable for shape analysis not only in dentistry, but also in broader medical applications, where guaranteeing consistent connectivity across shapes from various clinics is unrealistic. The code is available at https://github.com/tiborkubik/toothForge. Hide abstract

2 — Self supervised learning

Session chair: Suyash Awate

190 - Resolving quantitative MRI model degeneracy in self-supervised machine learning Giulio Minore, Louis Dwyer Hemmings, Timothy Bray, Hui Zhang , Study group leader: Alex Frangi
- abstract
Quantitative MRI (qMRI) estimates tissue properties of interest from measured MRI signals. This process is conventionally achieved by model fitting, whose computational expense limits qMRI’s clinical use, motivating recent development of machine learning-based methods. Self-supervised approaches are particularly popular as they avoid the pitfall of distributional shift that affects supervised methods. However, it is unknown how such methods behave if similar signals can result from multiple tissue properties, a common challenge known as model degeneracy. Understanding this is crucial for ascertaining the scope within which self-supervised approaches may be applied. To this end, this work makes two contributions. First, we demonstrate that model degeneracy compromises self-supervised approaches, motivating the development of mitigation strategies. Second, we propose a mitigation strategy based on applying appropriate constraining transforms on the output of the bottleneck layer of the autoencoder network typically employed in self-supervised approaches. We illustrate both contributions using the estimation of proton density fat fraction and R2* from chemical shift-encoded MRI, an ideal exemplar due to its exhibition of degeneracy across the full parameter space. The results from both simulation and in vivo experiments demonstrate that the proposed strategy helps resolve model degeneracy. Hide abstract

3 — Foundation Models

Session chair: Julia Schnabel

105 - Cascaded Diffusion Model and Segment Anything Model for Medical Image Synthesis via Uncertainty-Guided Prompt Generation Haowen Pang, Xiaoming Hong, Peng Zhang, Chuyang Ye , Study group leader: Jim Duncan
- abstract
Multi-modal medical images provide diverse diagnostic information, but limitations such as scan time, costs, and radiation dose can hinder modality acquisition. Synthesizing missing images offers a promising solution. Deep learning methods have shown success in medical image synthesis. However, these methods can still struggle with the presence of notable image anomalies caused by pathologies. In this work, to improve the synthesis robustness to anomalies, we propose a model cascading the Diffusion Model (DM) and the Segment Anything Model (SAM) with uncertainty-guided prompt generation for medical image synthesis. We hypothesize that SAM (the medical variant) is beneficial to the synthesis tasks because 1) the SAM encoder trained on large, diverse datasets allows the model to grasp a deep understanding of complex anomaly patterns of pathologies and 2) its ability to take prompt inputs naturally allows the synthesis to pay special attention to abnormal regions that are hard to synthesize. To effectively integrate the DM and SAM, we propose the uncertainty-guided prompt generation framework, where DM synthesis results with higher uncertainty are considered regions potentially with worse synthesis quality and prompts are generated for SAM accordingly to improve the result. First, we propose to estimate the uncertainty of DM synthesis output by repeated noise sampling. Then, prompts are generated based on the uncertainty and given to SAM, together with the DM input image and output. For effective interaction between the prompt, DM input, and DM output, we propose an Uncertainty Guided Cross Attention (UGCA) module, where the prompt serves as a query to guide the model to focus on relevant regions of the DM input and output. Finally, a synthesis decoder replaces the SAM decoder, and it is trained together with UGCA. Results on two public datasets show that our method outperforms existing methods when images manifest notable anomalies. Hide abstract
117 - SpectMamba: Integrating Frequency and State Space Models for Enhanced Medical Image Detection wang yao, yang dong, qiao zhi, huang wenjian, qian zhen , Study group leader: Jerry Prince
- abstract
Abnormality detection in medical imaging is a critical task requiring both high efficiency and accuracy to support effective diagnosis. While convolutional neural networks (CNNs) and Transformer-based models are widely used, both face intrinsic challenges: CNNs have limited receptive fields, restricting their ability to capture broad contextual information, and Transformers encounter prohibitive computational costs when processing high-resolution medical images. Mamba, a recent innovation in natural language processing, has gained attention for its ability to process long sequences with linear complexity, offering a promising alternative. Building on this foundation, we present SpectMamba, the first Mamba-based architecture designed for medical image detection. A key component of SpectMamba is the Hybrid Spatial-Frequency Attention (HSFA) block, which separately learns high- and low-frequency features. This approach effectively mitigates the loss of high-frequency information caused by frequency bias and correlates frequency-domain features with spatial features, thereby enhancing the model's ability to capture global context. To further improve long-range dependencies, we propose the Visual State-Space Module (VSSM) and introduce a novel Hilbert Curve Scanning technique to strengthen spatial correlations and local dependencies, further optimizing the Mamba framework. Comprehensive experiments show that SpectMamba achieves state-of-the-art performance while being both effective and efficient across various medical image detection tasks. Hide abstract

4 — Image registration

Session chair: Ismail Ben Ayed

51 - GSSD: A Self-Distillation Paradigm with Gradient Surgery for End-to-End Deformable Image Registration Yuxi Zheng, Yansong Bai, Yuchuan Qiao , Study group leader: Gary Christensen
- abstract
Deformable Image Registration (DIR) is crucial for various medical image analysis tasks.However, deep learning methods struggle to balance registration accuracy and computational complexity.Incorporating knowledge distillation into registration emerges as a promising approach; however, these methods often rely on pre-designed or heuristically chosen teacher networks. Their efficiency is not optimal, primarily because they fail to account for gradient conflicts between student and teacher networks.In this paper, we propose a novel self-distillation paradigm with gradient surgery for end-to-end deformable image registration, named GSSD.Specifically, to design a universally applicable knowledge distillation paradigm, the teacher network is directly cloned from the student network and is removed after training, reducing hardware requirements upon deployment.To resolve potential gradient conflicts between the student and teacher networks, we introduce a two-stage gradient surgery optimization strategy by projecting the conflicting gradient into the normal plane of the dominant gradient, ensuring the distillation efficacy.Extensive experiments conducted on three publicly available datasets demonstrate consistent improvements over various methods, with no increase in inference time and parameters, especially more than 3% increase in Dice score for liver CT. Hide abstract

Tuesday 27

1 — CAI

Session chair: Jose Dolz

89 - BioSonix: Can Physics-based Sonification Perceptualize Tissue Deformations From Tool Interactions? Veronica Ruozzi, Sasan Matinfar, Laura Schütz, Benedikt Wiestler, Alberto Cesare Luigi Redaelli, Emiliano Votta, Nassir Navab , Study group leader: Shaoting Zhang
- abstract
Perceptualizing tool interactions with deformable structures in surgical procedures remains challenging, as unimodal visualization techniques often fail to capture the complexity of these interactions due to constraints such as occlusion and limited depth perception. This paper presents a novel approach to augment tool navigation in mixed reality environments by providing auditory representations of tool-tissue dynamics, particularly for interactions with soft tissue. BioSonix, a physics-informed design framework, utilizes tissue displacements in 3D space to compute excitation forces for a sound model encoding tissue properties such as stiffness and density. Biomechanical simulations were employed to model particle displacements resulting from tool-tissue interactions, establishing a robust foundation for the method. An optimization approach was used to define configurations for capturing diverse interaction scenarios with varying tool trajectories. Experiments were conducted to validate the accuracy of the sound-displacement mappings. Additionally, two user studies were performed: the first involved two clinical professionals (a neuroradiologist and a cardiologist), who confirmed the method’s impact and achieved high task accuracy; the second included 22 biomedical experts, who demonstrated high discrimination accuracy in tissue differentiation and targeting tasks. The results revealed a strong correlation between tool-tissue dynamics and their corresponding auditory profiles, highlighting the potential of these sound representations to enhance the intuitive understanding of complex interactions. Hide abstract

Wednesday 28

1 — MR Reconstruction & modeling

Session chair: Herve Lombaert

161 - Optimization of acquisition schemes towards a better estimation of microstructure parameters in multidimensional diffusion MRI Constance Bocquillon, Isabelle Corouge, Emmanuel Caruyer , Study group leader: Julia Schnabel
- abstract
Unlike traditional measurements by diffusion tensor imaging, multidimensional diffusion MRI allows the estimation of additional microstructural parameters such as anisotropy, kurtosis and orientation dispersion. To properly take advantage of this imaging modality and capturing microstructure parameters accurately and efficiently, it is crucial to use a dedicated acquisition scheme. Several models and acquisition representations can be used towards this goal. In this paper, we focused on the q-space trajectory imaging, using b-tensor acquisition encoding and the diffusion tensor distribution (DTD) modeling. More specifically, our goal is to develop a framework for the optimization of acquisition scheme based on their ability to properly estimate microstructural parameters of interest. We generated an extensive collection of b-tensor shapes with a fixed number of directions each, from which we efficiently selected an optimized acquisition scheme. In the spirit of fingerprinting, we proposed a dictionary-based approach. The dictionary columns were carefully adapted to the achievable resolution in the parameter space, the parameters of interest being the microscopic anisotropy, the tensor size variance and the orientation parameter. To solve the combinatorial optimization problem of selecting the best subset of b-tensor shapes, we implemented two approximation algorithms: a greedy approach based and a permutation strategy.To assess the performance of our optimization procedure, we computed the estimation error for each parameter. The signal generated from our scheme yielded lower or comparable errors to those of a reference scheme proposed in the literature and designed for this purpose. Hide abstract
220 - Bayesian Learning with Stochastic Perturbations and Langevin Expectation Maximization for Unsupervised DNN Image Quality Enhancement Vatsala Sharma, Suyash P. Awate , Study group leader: C-F Westin
- abstract
Unsupervised learning of deep-neural-networks (DNNs) for image quality enhancement can overcome the real-world challenge of the lack of high-quality training images. Typical DNNs for weakly/un-supervised image restoration make strong assumptions that are often infeasible or undesirable in clinical scenarios, e.g., they (a) demand multiple acquired degraded instances per scene, (b) simulate degraded instances assuming independent identically-distributed noise per pixel, (c) demand pre-training large diffusion models on large sets of high(er)-quality images, or (d) ignore uncertainty estimation in their outputs. We propose a novel BayesianDNN framework for unsupervised image quality enhancement incorporating (i) stochastic perturbations at multiple stages within the DNN architecture, for regularization and data-driven automatic generation of realistic degraded instances, (ii) variational/distribution modeling in latent space, (iii) novel Monte-Carlo expectation maximization of DNN parameters using Langevin diffusion in latent space, and (iv) novel low-density sampling for perturbations using normalized Langevin diffusion. Results on publicly available datasets demonstrates the benefits of our DNN framework over existing methods in CT, MRI, and PET. Hide abstract

2 — Vision-language models

Session chair: Miaomiao Zhang

57 - Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography Yuexi Du, John Onofrey, Nicha C. Dvornek , Study group leader: Ismail Ben Ayed
- abstract
Contrastive Language-Image Pre-training (CLIP) demonstrates strong potential in medical image analysis but requires substantial data and computational resources. Due to these restrictions, existing CLIP applications in medical imaging focus mainly on modalities like chest X-rays that have abundant image-report data available, leaving many other important modalities under-explored. Here, we propose one of the first adaptations of the full CLIP model to mammography, which presents significant challenges due to labeled data scarcity, high-resolution images with small regions of interest, and class-wise imbalance. We first develop a specialized supervision framework for mammography that leverages its multi-view nature. Furthermore, we design a symmetric local alignment module to better focus on detailed features in high-resolution images. Lastly, we incorporate a parameter-efficient fine-tuning approach for large language models pre-trained with medical knowledge to address data limitations. Our multi-view and multi-scale alignment (MaMA) method outperforms state-of-the-art baselines for three different tasks on two large real-world mammography datasets, EMBED and RSNA-Mammo, with only 52% model size compared with the largest baseline. The code is available at https://github.com/XYPB/MaMA. Hide abstract
172 - Full Conformal Adaptation of Medical Vision-Language Models Julio Silva-Rodríguez, Leo Fillioux, Paul-Henry Cournède, Maria Vakalopoulou, Stergios Christodoulidis, Ismail Ben Ayed, Jose Dolz , Study group leader: Farid Azampour
- abstract
Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities and are being progressively integrated into medical image analysis. Although its discriminative potential has been widely explored, its reliability aspect remains overlooked. This work investigates their behavior under the increasingly popular split conformal prediction (SCP) framework, which theoretically guarantees a given error level on output sets by leveraging a labeled calibration set. However, the zero-shot performance of VLMs is inherently limited, and common practice involves few-shot transfer learning pipelines, which cannot absorb the rigid exchangeability assumptions of SCP. To alleviate this issue, we propose full conformal adaptation, a novel setting for jointly adapting and conformalizing pre-trained foundation models, which operates transductively over each test data point using a few-shot adaptation set. Moreover, we complement this framework with SS-Text, a novel training-free linear probe solver for VLMs that alleviates the computational cost of such a transductive approach. We provide comprehensive experiments using 3 different modality-specialized medical VLMs and 9 adaptation tasks. Our framework requires exactly the same data as SCP, and provides consistent relative improvements of up to 27% on set efficiency while maintaining the same coverage guarantees. Hide abstract

Thursday 29

1 — Diffusion models

Session chair: Guorong Wu

193 - IGG: Image Generation Informed by Geodesic Dynamics in Deformation Spaces Nian Wu, Nivetha Jayakumar, Jiarui Xing, Miaomiao Zhang , Study group leader: Jose Dolz
- abstract
Generative models have recently gained increasing attention in image generation and editing tasks. However, they often lack a direct connection to object geometry, which is crucial in sensitive domains such as computational anatomy, biology, and robotics. This paper presents a novel framework for Image Generation informed by Geodesic dynamics (IGG) in deformation spaces. Our IGG model comprises two key components: (i) an efficient autoencoder that explicitly learns the geodesic path of image transformations in the latent space; and (ii) a latent geodesic diffusion model that captures the distribution of latent representations of geodesic deformations conditioned on text instructions. By leveraging geodesic paths, the method ensures smooth, topology-preserving, and interpretable deformations, capturing complex variations in image structures while maintaining geometric consistency. We validate the proposed IGG on plant growth data and brain magnetic resonance imaging (MRI). Experimental results show that IGG outperforms the state-of-the-art image generation/editing models with superior performance in generating realistic, high-quality images with preserved object topology and reduced artifacts. Our code will be publicly available at Github. Hide abstract

2 — Neuroimaging

Session chair: C-F Westin

49 - Disentangle disease-relevant patterns from irrelevant patterns in fMRI analysis using equivariant and contrastive learning Xin Shen, Shengjie Zhang, Wenbin Liu, Yuan Zhou , Study group leader: Gary Zhang
- abstract
Functional magnetic resonance imaging (fMRI) holds great potential for diagnosing and understanding brain disorders. However, the complexity and subtlety of disease-relevant variations in fMRI present significant challenges. To address this issue, we propose a framework that combines equivariant learning and contrastive learning (ECL) to disentangle disease-relevant patterns from irrelevant patterns in fMRI. The framework uses a personalized mask to separate the functional connectivity network from fMRI into a disease-relevant subgraph and an irrelevant subgraph. The disease-relevant subgraph undergoes an equivariant learning pipeline to align the orbit of the encoded features with the orbit of the augmented views of the inputs. The disease-irrelevant subgraph undergoes a contrastive learning pipeline that pulls the encoded features to be close from augmented views of the same input. By combining these 2 learning processes, the learned encoder can be invariant to perturbations to disease-irrelevant patterns while equivariant to disease-relevant variations. The proposed approach achieved state-of-the-art classification performance across 3 benchmark datasets: ABIDE I, ABIDE II, and ADHD-200, with significant improvements in accuracy (improved by up to 5\%). Interpretability experiments identified disease-related regions of interest (ROIs) of clinical relevance. These results establish our framework as a promising tool for analyzing brain networks in fMRI. The code is available at {https://github.com/CXshen468/ecl}. Hide abstract

Friday 30

1 — Unsupervised Learning

Session chair: Sharon Huang

28 - Unsupervised Accelerated MRI Reconstruction via Ground-Truth-Free Flow Matching Xinzhe Luo, Yingzhen Li, Chen Qin , Study group leader: Isabelle Corouge
- abstract
Accelerated magnetic resonance imaging involves reconstructing fully sampled images from undersampled k-space measurements. Current state-of-the-art approaches have mainly focused on either end-to-end supervised training inspired by compressed sensing formulations, or posterior sampling methods built on modern generative models. However, their efficacy heavily relies on large datasets of fully sampled images, which may not always be available in practice. To address this issue, we propose an unsupervised MRI reconstruction method based on ground-truth-free flow matching (GTF^2M). Particularly, the GTF^2M learns a prior denoising process of fully sampled ground-truth images using only undersampled data. Based on that, an efficient cyclic reconstruction algorithm is further proposed to perform forward and backward integration in the dual space of image-space signal and k-space measurement. We compared our method with state-of-the-art learning-based baselines on the fastMRI database of both single-coil knee and multi-coil brain MRIs. The results show that our proposed unsupervised method can significantly outperform existing unsupervised approaches, and achieve performance comparable to most supervised end-to-end and prior learning baselines trained on fully sampled MRI, while offering greater efficiency than existing generative model-based approaches. Hide abstract

2 — Disease modeling

Session chair: Isabelle Corouge

73 - Explainable Deep Model for Understanding Neuropathological Events Through Neural Symbolic Regression Tingting Dan, Guorong Wu , Study group leader: Sharon Huang
- abstract
Mounting evidence shows that Alzheimer's disease (AD) is characterized by the propagation of tau aggregates throughout the brain in a prion-like manner. Since current pathology imaging technologies can only provide a spatial brain mapping of tau accumulation, computational modeling becomes indispensable in analyzing the spatiotemporal propagation patterns of widespread tau aggregates. To address this challenge, we present a novel physics-informed neural network for AD (coined PINN4AD) by conceptualizing the intercellular spreading of tau pathology in a reaction-diffusion model, where each node (brain region) is ubiquitously wired with other nodes while interacting with amyloid burdens. In this context, we formulate the biological process of tau spreading in a principled potential energy transport model that describes the mechanistic role of Aβ-tau interaction in the widespread flow of tau aggregates. The physics principle and mathematics insight allow us to develop an explainable neural network to uncover the spatiotemporal dynamics of tau propagation from the unprecedented amount of longitudinal neuroimages. On top of this, we introduce a symbolic regression module into the PINN4AD to further elucidate the analytic expressions underlying Aβ-tau interaction and tau propagation mechanism. We have achieved not only an enhanced prediction accuracy of tau propagation on ADNI and OASIS datasets but also a system-level understanding of the pathophysiological mechanism in AD progression, suggesting great potential for research in AD and AD-related dementias. Hide abstract

Posters

Tuesday

1 - MC-NuSeg: Multi-Contour Aware Nuclei Instance Segmentation Hyun Namgung, siwoo Nam, soopil Kim, Sang Hyun Park
- abstract
Accurate nuclei instance segmentation is critical in digital pathology image analysis, facilitating disease diagnosis and advancing medical research. While various methods have been proposed, recent approaches leverage foundation models like the Segment Anything Model (SAM) for their robust representational power. However, existing models face challenges in handling the unique characteristics of histopathology images, particularly dense nuclei clusters, and complex morphological and staining variations. To address these issues, we propose a novel method, Multi-Contour Aware Nuclei Instance Segmentation (MC-NuSeg) framework, which incorporates the hierarchical boundary structure of nuclei for precise segmentation. MC-NuSeg predicts multiple segmentation maps corresponding to different contour layers, allowing for accurate separation of densely clustered nuclei and those with high morphological variance. Furthermore, we introduce an auxiliary instance counting loss that directly supervises the number of nuclei, significantly enhancing segmentation accuracy by reducing false positives and missed cases. Extensive evaluations on four public pathology datasets demonstrate that MC-NuSeg achieves state-of-the-art performance, effectively addressing the challenges of nuclei instance segmentation. Hide abstract
12 - Knowledge-enhanced Hyperbolic Language-Image Pretraining for Zero-shot Learning Linbin Han, Zhi Qiao, Xiantong Zhen, Jiahong Gao, Zhen Qian
- abstract
In light of the inherent entailment relations between images and text, hyperbolic point vector embeddings, which leverage the hierarchical structure of hyperbolic space, have been used for visual-semantic representation, showing significant advantages in zero-shot learning tasks. However, unlike general image-text alignment tasks, the availability of high-quality paired image-text data in the medical domain is limited. This scarcity presents challenges for visual language models, hindering their ability to effectively comprehend free-text medical reports and the associated images. Moreover, many medical terms are complex, specialized, and abstract, and embeddings derived solely from raw imaging report texts often fail to generalize effectively. To address these challenges, we propose MkCLIPH, a hyperbolic space image-text alignment pre-training method that incorporates medical domain knowledge. MkCLIPH models the visual-semantic hierarchical partial order relationship through hierarchical entailment angle modeling and integrates medical domain knowledge as a prior to enhance the representation of medical image-text data. This improves generalization and interpretability. Experimental results demonstrate that our method outperforms baseline approaches in terms of interpretability and performance across a range of zero-shot tasks and datasets. Hide abstract
20 - GeoT: Geometry-guided Instance-dependent Transition Matrix for Semi-supervised Tooth Point Cloud Segmentation Weihao Yu, Xiaoqing Guo, Chenxin Li, Yifan Liu, Yixuan Yuan
- abstract
Achieving meticulous segmentation of tooth point clouds from intra-oral scans stands as an indispensable prerequisite for various orthodontic applications. Given the labor-intensive nature of dental annotation, a significant amount of data remains unlabeled, driving increasing interest in semi-supervised approaches. One primary challenge of existing semi-supervised medical segmentation methods lies in noisy pseudo labels generated for unlabeled data. To address this challenge, we propose GeoT, the first framework that employs instance-dependent transition matrix (IDTM) to explicitly model noise in pseudo labels for semi-supervised dental segmentation. Specifically, to handle the extensive solution space of IDTM arising from tens of thousands of dental points, we introduce tooth geometric priors through two key components: point-level geometric regularization (PLGR) to enhance consistency between point adjacency relationships in 3D and IDTM spaces, and class-level geometric smoothing (CLGS) to leverage the fixed spatial distribution of tooth categories for optimal IDTM estimation. Extensive experiments performed on the public Teeth3DS dataset and private dataset demonstrate that our method can make full utilization of unlabeled data to facilitate segmentation, achieving performance comparable to fully supervised methods with only $20\%$ of the labeled data. Hide abstract
31 - RemInD: Remembering Anatomical Variations for Interpretable Domain Adaptive Medical Image Segmentation Xin Wang, Yin Guo, Kaiyu Zhang, Niranjan Balu, Mahmud Mossa-Basha, Linda Shapiro, Chun Yuan
- abstract
This work presents a novel Bayesian framework for unsupervised domain adaptation (UDA) in medical image segmentation. While prior works have explored this clinically significant task using various strategies of domain alignment, they often lack an explicit and explainable mechanism to ensure that target image features capture meaningful structural information. Besides, these methods are prone to the curse of dimensionality, inevitably leading to challenges in interpretability and computational efficiency. To address these limitations, we propose RemInD, a framework inspired by human adaptation. RemInD learns a domain-agnostic latent manifold, characterized by several anchors, to memorize anatomical variations. By mapping images onto this manifold as weighted anchor averages, our approach ensures realistic and reliable predictions. This design mirrors how humans develop representative components to understand images and then retrieve component combinations from memory to guide segmentation. Notably, model prediction is determined by two explainable factors: a low-dimensional anchor weight vector, and a spatial deformation. This design facilitates computationally efficient and geometry-adherent adaptation by aligning weight vectors between domains on a probability simplex. Experiments on two public datasets, encompassing cardiac and abdominal imaging, demonstrate the superiority of RemInD, which achieves state-of-the-art performance using a single alignment approach, outperforming existing methods that often rely on multiple complex alignment strategies. Hide abstract
36 - Hierarchical CLIPs for Fine-grained Anatomical Lesion Localization from Whole-body PET/CT Images Mingyang Yu, Yaozong Gao, Yiran Shu, Yanbo Chen, Jingyu Liu, Caiwen Jiang, Kaicong Sun, Weifang Zhang, Yiqiang Zhan, Xiang sean Zhou, Shaonan Zhong, Xinlu Wang, Meixin Zhao, Dinggang Shen
- abstract
Accurate identification of lesions, including anatomical lesion localization, is critical for automated radiology report generation. However, this task is particularly challenging in whole-body PET/CT imaging due to large amount of diverse anatomical regions throughout the whole body. Existing studies mainly rely on anatomical detection or segmentation. These methods are generally limited to only a small subset of anatomical regions due to difficulty of manual segmentation and annotation for large set of anatomical regions in the training stage. To address this issue, we propose a hierarchical CLIP-based 3D model to precisely and efficiently identify 387 anatomical lesion locations within whole-body PET/CT scans. Our model is built on three strategies: (1) Hierarchical localization, based on which anatomical locations are identified from coarse to fine to improve localization accuracy, robustness, and scalability; (2) Semantic location augmentation, which incorporates anatomical knowledge of relative location to adjacent regions to encourage neighborhood preservation of text feature representations; and (3) Location ambiguity mitigation, which excludes penalties on the top-K ambiguous localizations in a modified CLIP loss to alleviate the cases with lesions residing at the boundaries of multiple regions. Notably, this work is the first to achieve accurate, robust, and efficient whole-body anatomical lesion localization, with significant performance improvement compared to the SOTA methods on a large whole-body PET/CT dataset comprising 1748 subjects acquired from multiple scanner makers. Hide abstract
50 - Concepts from Neurons: Building Interpretable Medical Image Diagnostic Models by Dissecting Opaque Neural Networks Shizhan Gong, Huayu Wang, Xiaofan Zhang, Qi Dou
- abstract
Deep learning has achieved remarkable success in medical image analysis, yet its translation to clinical settings is often impeded by the opaque nature of these models. While interpretable models like Concept Bottleneck Models (CBMs) maintains good interpretability, they often require manual designed and annotated concepts, or rely heavily on pre-trained vision-language models. In the medical domain, however, the concepts are often too complicated to be described precisely by pure plain text, and the unified foundation models of high performance are still missing. To address these challenges, we propose a novel framework that extracts human-understandable concepts from pre-trained opaque models and then builds surrogate CBMs for interpretable diagnosis. We first employ sparse autoencoders to disentangle learned representations into a limited set of clinically relevant concepts, which are then transformed into plain text with the assistance of domain experts or large language models. Utilizing concept activation vectors, we can project these concepts into a shared representation space and apply submodular optimization to select the most informative concepts for model inference. The interpretable surrogate CBMs are finally constructed through sparsely decomposing the visual representation into concepts representations. We validate our framework on three medical diagnostic benchmarks: HAM10000, Harvard-FairVLMed, and MIMIC-CXR. The results indicate that our method achieves performance comparable to opaque models while significantly enhancing interpretability, outperforming previous CBMs. Hide abstract
61 - Interpretable Few-Shot Retinal Disease Diagnosis with Concept-Guided Prompting of Vision-Language Models Deval Mehta, Yiwen Jiang, Catherine Jan, Mingguang He, Kshitij Jadhav, Zongyuan Ge
- abstract
Recent advancements in deep learning have shown significant potential for classifying retinal diseases using color fundus images. However, existing works predominantly rely exclusively on image data, lack interpretability in their diagnostic decisions, and treat medical professionals primarily as annotators for ground truth labeling. To fill this gap, we implement two key strategies: extracting interpretable concepts of retinal diseases using the knowledge base of GPT models and incorporating these concepts as a language component in prompt-learning to train vision-language (VL) models with both fundus images and their associated concepts. Our method not only improves retinal disease classification but also enriches few-shot and zero-shot detection (novel disease detection), while offering the added benefit of concept-based model interpretability. Our extensive evaluation across two diverse retinal fundus image datasets illustrates substantial performance gains in VL-model based few-shot methodologies through our concept integration approach, demonstrating an average improvement of approximately 5.8% and 2.7% mean average precision for 16-shot learning and zero-shot (novel class) detection respectively. Our method marks a pivotal step towards interpretable and efficient retinal disease recognition for real-world clinical applications. Hide abstract
64 - Dynamic Allocation Hypernetwork with Adaptive Model Recalibration for Federated Continual Learning Xiaoming Qi, Jingyang Zhang, Huazhu Fu, Guanyu Yang, Shuo Li, Yueming Jin
- abstract
Federated continual learning (FCL) offers an emerging pattern to facilitate the applicability of federated learning (FL) in real-world scenarios, where tasks evolve dynamically and asynchronously across clients, especially in medical scenario. Existing server-side FCL methods in nature domain construct a continually learnable server model by client aggregation on all-involved tasks. However, they are challenged by: (1) Catastrophic forgetting for previously learned tasks, leading to error accumulation in server model, making it difficult to sustain comprehensive knowledge across all tasks. (2) Biased optimization due to asynchronous tasks handled across different clients, leading to the collision of optimization targets of different clients at the same time steps. In this work, we take the first step to propose a novel server-side FCL pattern in medical domain, Dynamic Allocation Hypernetwork with adaptive model recalibration (FedDAH). It is to facilitate collaborative learning under the distinct and dynamic task streams across clients. To alleviate the catastrophic forgetting, we propose a dynamic allocation hypernetwork (DAHyper) where a continually updated hypernetwork is designed to manage the mapping between task identities and their associated model parameters, enabling the dynamic allocation of the model across clients. For the biased optimization, we introduce a novel adaptive model recalibration (AMR) to incorporate the candidate changes of historical models into current server updates, and assign weights to identical tasks across different time steps based on the similarity for continual optimization. Extensive experiments on the AMOS dataset demonstrate the superiority of our FedDAH to other FCL methods on sites with different task streams. Code is available:https://github.com/jinlab-imvr/FedDAH. Hide abstract
78 - Continuous Diffusion Model for Self-supervised Denoising and Super-resolution on Fluorescence Microscopy Images Colin S. C. Tsang, Albert Chung
- abstract
Recent studies have shown that Joint Denoising and Super-Resolution (JDSR) approaches can produce high-quality medical images. However, obtaining ground truth images for training is often infeasible in fluorescence microscopy (FM). Moreover, current JDSR methods can be impractical as they only rely on a fixed scale for resolution. Existing methods aim to learn a direct mapping between low-quality input and high-quality output in pixel space. In this paper, we introduce the Continuous Diffusion Model. This novel self-supervised method iteratively retrieves a noise-free image in continuous space, enabling us to generate clean images at any arbitrary resolutions. Our quantitative analysis confirms the effectiveness of our proposed method. We outperformed both supervised and self-supervised methods by an average of 30.5%/7.4% in terms of RMSE/SSIM. Our output preserves fine details and structures better than other state-of-the-art methods, as demonstrated by the qualitative analysis. The source code is available at https://github.com/colinsctsang/ContinuousDiffusionModel. Hide abstract
92 - Cycle-consistent zero-shot through-plane super-resolution for anisotropic head MRI Samuel Remedios, Shuwen Wei, Aaron Carass, Blake Dewey, Jerry Prince
- abstract
Magnetic resonance (MR) images are often acquired as anisotropic volumes in clinical settings. Such volumes have a worse through-plane resolution than in-plane resolution, hampering results in many processing pipelines that expect isotropic resolutions. Super-resolution (SR) is a promising methodology to address this problem, but there is concern whether the estimated high-resolution (HR) image suffers from egregious hallucinations, especially with deep learning methods that produce aesthetically pleasing results. One approach to restrict the impact of hallucinations is to guarantee that the estimated HR image is exactly cycle-consistent with the low-resolution observation. The denoising diffusion null space model (DDNM) achieves this through a range null space decomposition, but the specific design of the forward map is left to the application. In this work, we analyze the forward problem in 2D MR acquisition and construct an appropriate linear map A. We train a denoising diffusion probabilistic model on T1-weighted (T1-w) head MR images from multiple datasets and implement DDNM using A for the SR task. We show that the approach yields exact cycle-consistent solutions that are also realistic. We evaluated the approach in a wide variety of T1-w MR datasets, including withheld subjects from training sites and two sites outside of the training domain. We achieve excellent qualitative and quantitative results according to both distortion and perceptual metrics. Hide abstract
101 - Diffusion MAE: Paving the Way for Representation Learning of Diffusion MRI Haotian Jiang, Geng Chen
- abstract
Deep learning has achieved significant success in diffusion MRI (dMRI) computing. However, most deep learning-based dMRI methods heavily rely on supervised learning, which requires large-scale labeled datasets that are difficult to obtain. Recently, Masked AutoEncoder (MAE) has gained popularity in computer vision for its ability to leverage unlabeled data, yet its potential has not yet been fully explored in dMRI. To fill this gap, we propose the first MAE-based self-supervised learning framework, called DMAE, for representation learning of dMRI data. In DMAE, we first create dMRI patches using a deliberately designed tube q-space masking strategy, which adapts well to the unique q-space characteristics of dMRI data. We then encode the patches into a latent space for pre-training to learn high-level semantic representations. During the fine-tuning stage, we design a task-specific decoder and incorporate the decoder with the pre-trained encoder to achieve superior performance in downstream tasks. Extensive quantitative and qualitative results demonstrate the superiority of our proposed method over the state-of-the-art self-supervised learning approaches. Hide abstract
137 - DIReCT: Domain-Informed Rectified Flow for Controllable Brain MRI to PET Translation Tuo Liu, Haifeng Wang, Heng Chang, Fan Wang, Chunfeng Lian, Jianhua Ma
- abstract
Recent advancements in generative learning have enabled PET image synthesis from more accessible MRI scans, offering a safer, cost-effective, and scalable alternative to traditional PET imaging, e.g., for Alzheimer’s disease (AD) diagnosis. However, current MRI-to-PET translation methods face limitations in controllability and fidelity, often failing to capture personalized metabolic activations and fine-grained structural details in critical regions. To address these challenges, we propose a novel controllable MRI-to-PET translation framework, termed DIReCT, which leverages rectified flow to generate high-fidelity PET images tailored to downstream diagnostic and analytical needs. By injecting cross-modal guidance from a pretrained vision-language model (BiomedCLIP), DIReCT incorporates both common imaging knowledge and individualized clinical information to enhance the personalization of PET synthesis. Extensive experiments on the ADNI dataset demonstrate that DIReCT significantly outperforms existing methods across various image quality metrics. Notably, the synthesized FDG-PET images by DIReCT achieve performance comparable to real FDG-PET scans, excelling in capturing AD-related pathological features for reliable group comparisons and personalized diagnosis. Hide abstract
163 - Hierarchical Neural Cellular Automata for Lightweight Microscopy Image Classification Michael Deutges, Chen Yang, Ario Sadafi, Nassir Navab, Carsten Marr
- abstract
Classification of cells in microscopy images is an essential step in the diagnostic workflows for various medical conditions. These diagnostic processes benefit from emerging deep learning solutions, which make them more accessible, reliable, and scalable. However, their extensive deployment is hindered by limited generalizability and high computational demands of such architectures. We address this issue by introducing a lightweight, general-purpose hierarchical classification model based on Neural Cellular Automata (NCA). Our approach utilizes NCA to extract features at multiple resolutions, combining the advantages of NCA-based methods with those of convolutional architectures. We evaluate our model on six microscopy datasets from different modalities and demonstrate that it consistently outperforms existing NCA-based approaches. With significantly fewer parameters than conventional deep learning methods, our model is suitable for deployment in resource-constrained areas, such as remote clinics with limited computational infrastructure or mobile devices with lower computational capacities. Our results highlight the potential of NCA-based models as an effective, lightweight alternative for image classification, addressing critical barriers to the equitable distribution of automated diagnostic tools. Hide abstract
171 - Bilinear Projector: Mitigating Discretization Artifacts in Model Based Iterative Reconstruction for X-ray CT Ke Chen, Alireza Entezari
- abstract
Pixel and voxel-basis are commonly used for designing X-ray CT projectors that underly existing projection and back-projection methods that are widely used in practice.We introduce a bilinear projector for modeling X-ray optics and provide experimental evidence showing its impact in reducing discretization artifacts in Model-Based Iterative Reconstruction (MBIR). We demonstrate the advantages of using this bilinear model over the conventional pixel-based projectors (piecewise constant) for MBIR in realistic simulations in presence of nonlinearity of the Lambert-Beer law. We derive efficient piecewise polynomial forms for footprint and detector blur computations for this new projector and demonstrate its efficient GPU implementation for high resolution fanbeam geometry. Hide abstract
178 - MAD-AD: Masked Diffusion for Unsupervised Brain Anomaly Detection. Farzad Beizaee, Gregory Lodygensky, Christian Desrosiers, Jose Dolz
- abstract
Unsupervised anomaly detection in brain images is crucial for identifying injuries and pathologies without access to labels. However, the accurate localization of anomalies in medical images remains challenging due to the inherent complexity and variability of brain structures and the scarcity of annotated abnormal data. To address this challenge, we propose a novel approach that incorporates masking within diffusion models, leveraging their generative capabilities to learn robust representations of normal brain anatomy. During training, our model processes only normal brain MRI scans and performs a forward diffusion process in the latent space that adds noise to the features of randomly-selected patches. Following a dual objective, the model learns to identify which patches are noisy and recover their original features. This strategy ensures that the model captures intricate patterns of normal brain structures while isolating potential anomalies as noise in the latent space. At inference, the model identifies noisy patches corresponding to anomalies and generates a normal counterpart for these patches by applying a reverse diffusion process. Our method surpasses existing unsupervised anomaly detection techniques, demonstrating superior performance in generating accurate normal counterparts and localizing anomalies.The code is available at https://github.com/farzad-bz/MAD-AD. Hide abstract
181 - LEDA: Log-Euclidean Diffeomorphism Autoencoder for Efficient Statistical Analysis of Diffeomorphisms Krithika Iyer, Shireen Elhabian, Sarang Joshi
- abstract
Image registration is a core task in computational anatomy that establishes correspondences between images. Invertible deformable registration, which computes a deformation field and handles complex, non-linear transformation, is essential for tracking anatomical variations, especially in neuroimaging applications where inter-subject differences and longitudinal changes are key. Analyzing the deformation fields is challenging due to their non-linearity, limiting statistical analysis. However, traditional approaches for analyzing deformation fields are computationally expensive, sensitive to initialization, and prone to numerical errors, especially when the deformation is far from the identity. To address these limitations, we propose the Log-Euclidean Diffeomorphism Autoencoder (LEDA), an innovative framework designed to compute the principal logarithm of deformation fields by efficiently predicting consecutive square roots. LEDA operates within a linearized latent space that adheres to the diffeomorphisms group action laws, enhancing our model's robustness and applicability. We also introduce a loss function to enforce inverse consistency, ensuring accurate latent representations of deformation fields. Extensive experiments with the OASIS-1 dataset demonstrate the effectiveness of LEDA in accurately modeling and analyzing complex non-linear deformations while maintaining inverse consistency. Additionally, we evaluate its ability to capture and incorporate clinical variables, enhancing its relevance for clinical applications. Hide abstract
186 - CoRLD: Contrastive Representation Learning Of Deformable Shapes In Images Tonmoy Hossain, Miaomiao Zhang
- abstract
Deformable shape representations, parameterized by deformations relative to a given template, have proven effective for improved image analysis tasks. However, their broader applicability is hindered by two major challenges. First, existing methods mainly rely on a known template during testing, which is impractical and limits flexibility. Second, they often struggle to capture fine-grained, voxel-level distinctions between similar shapes (e.g., anatomical variations among healthy individuals, those with mild cognitive impairment, and diseased states). To address these limitations, we propose a novel framework - Contrastive Representation Learning of Deformable shapes (CoRLD) in learned deformation spaces and demonstrate its effectiveness in the context of image classification. Our CoRLD leverages a class-aware contrastive supervised learning objective in latent deformation spaces, promoting proximity among representations of similar classes while ensuring separation of dissimilar groups. In contrast to previous deep learning networks that require a reference image as input to predict deformation changes, our approach eliminates this dependency. Instead, template images are utilized solely as ground truth in the loss function during the training process, making our model more flexible and generalizable to a wide range of medical applications. We validate CoRLD on diverse datasets, including real brain magnetic resonance imaging (MRIs) and adrenal shapes derived from computed tomography (CT) scans. Experimental results show that our model effectively extracts deformable shape features, which can be easily integrated with existing classifiers to substantially boost the classification accuracy. Our code is available at GitHub. Hide abstract

Thursday

8 - Taming Masked Image Modeling for Chest X-ray Diagnosis by Incorporating Clinical Visual Priors Zihao Zhao, Mei Wang, Zhiming Cui, Sheng Wang, Qian Zhou, Li Fan, Qian Wang, Dinggang Shen
- abstract
Accurate chest X-ray diagnosis typically requires a nuanced understanding of image semantics. Hence, masked image modeling (MIM) has gained attention as a promising approach for learning local image semantics by reconstructing masked patches from their surrounding context. However, existing MIM methods have two key limitations when applied to chest X-ray images: (1) the use of semantic-unaware masking strategies which may corrupt clinically significant visual features, and (2) insufficient multi-scale supervision which fails to consider abnormalities of varying sizes. To address these challenges, we introduce CXR-MIM, a novel MIM framework that incorporates clinical visual priors to guide the image modeling process. Specifically, we leverage these priors to indicate key aspects of a visual feature, including its semantics, location, size, and shape.For semantic and location awareness, CXR-MIM utilizes radiologists' gaze data to differentiate between clinically significant and insignificant regions in the masking phase, implementing a controlled masking strategy that moderately masks significant diagnostic features.To capture abnormalities of different sizes and shapes, we develop a pyramid adaptive reconstruction module to provide supervision across multiple scales in the reconstruction phase, which is further enhanced by semantic-aware recalibrated gaze heatmaps.Experiments on both two publicly available datasets and one private dataset demonstrate the superior performance of CXR-MIM compared to existing MIM methods. Further evaluation involving pre-training on an additional large-scale dataset indicates promising scalability with increasing data size, underscoring its potential in the age of foundation models. Our code will be available at Github. Hide abstract
17 - Pitfalls of topology-aware image segmentation Alexander Berger, Laurin Lux, Alexander Weers, Martin J. Menten, Daniel Rueckert, Johannes Paetzold
- abstract
Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues' profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods. Hide abstract
25 - Bi-invariant Geodesic Regression with Data from the Osteoarthritis Initiative Johannes Schade, Christoph von Tycowicz, Martin Hanik
- abstract
Many phenomena are naturally characterized by measuring continuous transformations such as shape changes in medicine or articulated systems in robotics.Modeling the variability in such datasets requires performing statistics on Lie groups, i.e., manifolds carrying an additional group structure.As the Lie group captures the symmetries in the data, it is essential from a theoretical and practical perspective to ask for statistical methods that respect these symmetries; this way they are insensitive to confounding effects, e.g., due to the choice of reference coordinate systems. In this work, we investigate geodesic regression---a generalization of linear regression originally derived for Riemannian manifolds.While Lie groups can be endowed with Riemannian metrics, these are generally incompatible with the group structure.We develop a non-metric estimator using an affine connection setting. It captures geodesic relationships while respecting the symmetries given by left and right translations.For its computation, we propose an efficient fixed point algorithm requiring simple differential expressions that can be calculated through automatic differentiation.We perform experiments on a synthetic example and evaluate our method on an open-access, clinical dataset studying knee joint configurations under the progression of osteoarthritis. Hide abstract
32 - Structure Observation Driven Image-Text Contrastive Learning for Computed Tomography Report Generation Hong Liu, Dong Wei, Peng Qiong, Yawen Huang, Xian Wu, Yefeng Zheng, Liansheng Wang
- abstract
Computed Tomography Report Generation (CTRG) aims to automate the clinical radiology reporting process, thereby reducing the workload of report writing and facilitating patient care. While deep learning approaches have achieved remarkable advances in generating reports for X-ray images, their effectiveness may be limited in CTRG due to the larger data volumes of CT images and the more intricate details required to describe them. This work introduces a novel two-stage (structure- and report-learning) framework tailored for CTRG featuring effective structure-wise image-text contrasting. In the first stage, a set of learnable structure-specific visual queries “observe” corresponding structures in a CT image. The resulting observation tokens are contrasted with structure-specific textual features extracted from the accompanying radiology report with a structure-wise image-text contrastive loss. In addition, text-text similarity-based soft pseudo targets are proposed to mitigate the impact of false negatives, i.e., semantically identical image structures and texts from non-paired images and reports. Thus, the model learns structure-level semantic correspondences between CT images and reports. Further, a dynamic, diversity-enhanced negative queue is proposed to guide the network in learning to discriminate various abnormalities. Subsequently, in the second stage, the visual structure queries are frozen and used to select the critical image patch embeddings depicting each anatomical structure, minimizing distractions from irrelevant areas while reducing memory consumption. Also, a text decoder is added and trained for report generation. Our extensive experiments on two public datasets demonstrate that our framework establishes new state-of-the-art performance for CTRG in clinical efficiency, and its components are effective. Hide abstract
35 - 4DRGS: 4D Radiative Gaussian Splatting for Efficient 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images Zhentao Liu, Ruyi Zha, Huangxuan Zhao, Hongdong Li, Zhiming Cui
- abstract
Reconstructing 3D vessel structures from sparse-view dynamic digital subtraction angiography (DSA) images enables accurate medical assessment while reducing radiation exposure. Existing methods often produce suboptimal results or require excessive computation time. In this work, we propose 4D radiative Gaussian splatting (4DRGS) to achieve high-quality reconstruction efficiently. In detail, we represent the vessels with 4D radiative Gaussian kernels. Each kernel has time-invariant geometry parameters, including position, rotation, and scale, to model static vessel structures. The time-dependent central attenuation of each kernel is predicted from a compact neural network to capture the temporal varying response of contrast agent flow. We splat these Gaussian kernels to synthesize DSA images via X-ray rasterization and optimize the model with real captured ones. The final 3D vessel volume is voxelized from the well-trained kernels. Moreover, we introduce accumulated attenuation pruning and bounded scaling activation to improve reconstruction quality. Extensive experiments on real-world patient data demonstrate that 4DRGS achieves impressive results in 5 minutes training, which is 32x faster than the state-of-the-art method. This underscores the potential of 4DRGS for real-world clinics. Hide abstract
42 - Brightness-Invariant Tracking Estimation in Tagged MRI Zhangxing Bian, Shuwen Wei, Xiao Liang, Yuan-Chiao Lu, Samuel Remedios, Fangxu Xing, Jonghye Woo, Dzung Pham, Aaron Carass, Philip Bayly, Jiachen Zhuo, Ahmed Alshareef, Jerry L. Prince
- abstract
Magnetic resonance (MR) tagging is an imaging technique for noninvasively tracking tissue motion in vivo by creating a visible pattern of magnetization saturation (tags) that deforms with the tissue. Due to longitudinal relaxation and progression to steady-state, the tags and tissue brightnesses change over time, which makes tracking with optical flow methods error-prone. Although Fourier methods can alleviate these problems, they are also sensitive to brightness changes as well as spectral spreading due to motion. To address these problems, we introduce the brightness-invariant tracking estimation (BRITE) technique for tagged MRI. BRITE disentangles the anatomy from the tag pattern in the observed tagged image sequence and simultaneously estimates the Lagrangian motion. The inherent ill-posedness of this problem is addressed by leveraging the expressive power of denoising diffusion probabilistic models to represent the probabilistic distribution of the underlying anatomy and the flexibility of physics-informed neural networks to estimate biologically-plausible motion. A set of tagged MR images of a gel phantom was acquired with various tag periods and imaging flip angles to demonstrate the impact of brightness variations and to validate our method. The results show that BRITE achieves more accurate motion and strain estimates as compared to other state of the art methods, while also being resistant to tag fading. Hide abstract
44 - SafeTriage: Facial Video De-identification for Privacy-Preserving Stroke Triage Tongan Cai, Haomiao Ni, Wenchao Ma, Yuan Xue, Qian Ma, Rachel Leicht, Kelvin Wong, John Volpi, Stephen Wong, James Wang, Sharon Huang
- abstract
Effective stroke triage in emergency settings often relies on clinicians' ability to identify subtle abnormalities in facial muscle coordination. While recent AI models have shown promise in detecting such patterns from patient facial videos, their reliance on real patient data raises significant ethical and privacy challenges---especially when training robust and generalizable models across institutions. To address these concerns, we propose extit{SafeTriage}, a novel method designed to de-identify patient facial videos while preserving essential motion cues crucial for stroke diagnosis. SafeTriage leverages a pretrained video motion transfer (VMT) model to map the motion characteristics of real patient faces onto synthetic identities. This approach retains diagnostically relevant facial dynamics without revealing the patients' identities. To mitigate the distribution shift between normal population pre-training videos and patient population test videos, we introduce a conditional generative model for visual prompt tuning, which adapts the input space of the VMT model to ensure accurate motion transfer without needing to fine-tune the VMT model backbone. Comprehensive evaluation, including quantitative metrics and clinical expert assessments, demonstrates that SafeTriage-produced synthetic videos effectively preserve stroke-relevant facial patterns, enabling reliable AI-based triage. Our evaluations also show that SafeTriage provides robust privacy protection while maintaining diagnostic accuracy, offering a secure and ethically sound foundation for data sharing and AI-driven clinical analysis in neurological disorders. Hide abstract
67 - 3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces Fabian Bongratz, Yitong Li, Sama Elbaroudy, Christian Wachinger
- abstract
Despite recent advances in medical image generation, existing methods struggle to produce anatomically plausible 3D structures. In synthetic brain magnetic resonance images (MRIs), characteristic fissures are often missing, and reconstructed cortical surfaces appear scattered rather than densely convoluted. To address this issue, we introduce Cor2Vox, the first diffusion model-based method that translates continuous cortical shape priors to synthetic brain MRIs. To achieve this, we leverage a Brownian bridge process which allows for direct structured mapping between shape contours and medical images. Specifically, we adapt the concept of the Brownian bridge diffusion model to 3D and extend it to embrace various complementary shape representations. Our experiments demonstrate significant improvements in the geometric accuracy of reconstructed structures compared to previous voxel-based approaches. Moreover, Cor2Vox excels in image quality and diversity, yielding high variation in non-target structures like the skull. Finally, we highlight the capability of our approach to simulate cortical atrophy at the sub-voxel level. Our code is available at https://github.com/ai-med/Cor2Vox. Hide abstract
91 - SkeIite: Compact Neural Networks for Efficient Iterative Skeletonization Luis David Reyes Vargas, Martin Menten, Johannes Paetzold, Nassir Navab, Mohammad F. Azampour
- abstract
Skeletonization extracts thin representations from images that compactly encode their geometry and topology. These representations have become an important topological prior for preserving connectivity in curvilinear structures, aiding medical tasks like vessel segmentation. Existing compatible skeletonization algorithms face significant trade-offs: morphology-based approaches are computationally efficient but prone to frequent breakages, while topology-preserving methods require substantial computational resources. We propose a novel framework for training iterative skeletonization algorithms with a learnable component. The framework leverages synthetic data, task-specific augmentation, and a model distillation strategy to learn compact neural networks that produce thin, connected skeletons with a fully differentiable iterative algorithm. Our method demonstrates a 100x speedup over topology-constrained algorithms while maintaining high accuracy and generalizing effectively to new domains without fine-tuning. Benchmarking and downstream validation in 2D and 3D tasks demonstrate its computational efficiency and real-world applicability. Code and data available here: https://github.com/luisdavid64/Skelite. Hide abstract
94 - VerSe: Integrating Multiple Queries as Prompts for Versatile Cardiac MRI Segmentation Bangwei Guo, Meng Ye, Yunhe Gao, Bingyu Xin, Leon Axel, Dimitris Metaxas
- abstract
Despite the advances in learning-based image segmentation approach, the accurate segmentation of cardiac structures from magnetic resonance imaging (MRI) remains a critical challenge. While existing automatic segmentation methods have shown promise, they still require extensive manual corrections of the segmentation results by human experts, particularly in complex regions such as the basal and apical parts of the heart. Recent efforts have been made on developing interactive image segmentation methods that enable human-in-the-loop learning. However, they are semi-automatic and inefficient, due to their reliance on click-based prompts, especially for 3D cardiac MRI volumes. To address these limitations, we propose VerSe, a Versatile Segmentation framework to unify automatic and interactive segmentation through mutiple queries. Our key innovation lies in the joint learning of object and click queries as prompts for a shared segmentation backbone. VerSe supports both fully automatic segmentation, through object queries, and interactive mask refinement, by providing click queries when needed. With the proposed integrated prompting scheme, VerSe demonstrates significant improvement in performance and efficiency over existing methods, on both cardiac MRI and out-of-distribution medical imaging datasets. Hide abstract
115 - Self-Supervised Denoising of Diffusion MRI Data with Efficient Collaborative Diffusion Model Xiaoyu Bai, Haotian Jiang, Geng Chen
- abstract
Diffusion MRI (dMRI) suffers from heavy noise, which undermines the accuracy and reliability of the subsequent quantitative analysis. Traditional deep learning denoising methods typically depend on training with paired noisy and clean data, which are not available in practice. Self-supervised techniques, such as DDM2, overcomes this limitation with the diffusion model. However, DDM2 is plagued by high computational cost and unsatisfactory performance when dealing with heavy noise. To tackle these challenges, we propose a novel self-supervised dMRI denoising model, called Efficient Collaborative Diffusion Model (ECDM). Specifically, we first employ a Noise2Noise-like method to obtain coarse denoised dMRI data. Subsequently, we use a latent encoder to compress the coarse data into a highly compact latent space. A diffusion model is then trained within this latent space to generate prior features. These features are passed to the denoising network through a hierarchical architecture and a cross-attention component for collaborative fine noise reduction. Our method not only achieves effective noise reduction with a collaborative coarse-to-fine framework, but also enhances the efficiency of the diffusion model by utilizing the compact latent representation. Extensive experiments on both simulated and real datasets demonstrate that ECDM surpasses existing dMRI denoising methods remarkably. Hide abstract
124 - A Multi-Layer Neural Transport Model for Characterizing Pathology Propagation in Neurodegenerative Diseases Haifeng Huang, Yi Wang, Tingting Dan, Yang Yang, Guorong Wu
- abstract
There is a growing consensus in neuroscience that pathological proteins accumulate and spread along specific large-scale brain networks, indicating the mechanistic role of connectome architecture in the progression of neurodegenerative diseases.Although mounting evidence shows that pathology spreading is a dynamic biological process shaped by the complex interplay between the wiring mechanism of neuronal fibers and the self-organized synchronization of functional fluctuations, current computational methods model the propagation of pathology burden through either structural connectivity (SC) or functional connectivity (FC). To address this limitation, we present a multi-layer transport model to capture the SC/FC-specific propagation of neuropathological burdens and their interactions from longitudinal imaging data. Furthermore, we proposed to parameterize the spreading pathways using a physics-informed neural network, enabling the prediction of the progression of pathological events at the baseline. We have evaluated the prediction accuracy of tau aggregates in Alzheimer's disease (AD), where our method achieves a significantly higher accuracy compared to existing approaches. In addition, the physics principle in our deep model allows us to explore the biological underpinning of how SC-FC interaction contributes to pathology propagation in AD. Taken together, enhanced prediction accuracy and model explainability suggest the great potential of our deep model in uncovering the pathophysiological mechanism in neurodegenerative diseases through data-driven approaches.
Hide abstract
131 - Medical Image Registration Meets Vision Foundation Model: Prototype Learning and Contour Awareness Hao Xu, Tengfei Xue, Jianan Fan, Dongnan Liu, Yuqian Chen, Fan Zhang, Carl-Fredrik Westin, Ron Kikinis, Lauren J. O’Donnell, Weidong Cai
- abstract
Medical image registration is a fundamental task in medical image analysis, aiming to establish spatial correspondences between paired images. However, existing unsupervised deformable registration methods rely solely on intensity-based similarity metrics, lacking explicit anatomical knowledge, which limits their accuracy and robustness. Vision foundation models, such as the Segment Anything Model (SAM), can generate high-quality segmentation masks that provide explicit anatomical structure knowledge, addressing the limitations of traditional methods that depend only on intensity similarity. Based on this, we propose a novel SAM-assisted registration framework incorporating prototype learning and contour awareness. The framework includes: (1) Explicit anatomical information injection, where SAM-generated segmentation masks are used as auxiliary inputs throughout training and testing to ensure the consistency of anatomical information; (2) Prototype learning, which leverages segmentation masks to extract prototype features and aligns prototypes to optimize semantic correspondences between images; and (3) Contour-aware loss, a contour-aware loss is designed that leverages the edges of segmentation masks to improve the model's performance in fine-grained deformation fields. Extensive experiments demonstrate that the proposed framework significantly outperforms existing methods across multiple datasets, particularly in challenging scenarios with complex anatomical structures and ambiguous boundaries. Our code is available at https://github.com/HaoXu0507/IPMI25-SAM-Assisted-Registration. Hide abstract
138 - Vascular-topology-aware Deep Structure Matching for 2D DSA and 3D CTA Rigid Registration Xiong Xiaosong, Jiang Caiwen, Wu Peng, Zhang Xiao, Song Yanli, Zhang Xinyi, Tao Ze, Wu Dijia, Shen Dinggang
- abstract
Accurate 2D/3D integration of Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) images holds significant potential for reducing risk for navigation in percutaneous coronary intervention (PCI). Rigid registration is a crucial step in achieving this integration. However, existing rigid registration methods based on manually designed features and subsequent rule-based graph matching exhibit limited generalization and often fail in complex surgical scenarios, such as vessel overlapping from 3D-to-2D projection and branch missing due to Chronic Total Occlusion (CTO). To address these challenges, we propose a vascular-topology-aware deep structure matching framework for 3D CTA and 2D DSA rigid registration. Our framework includes key-point extraction, where 2D/3D topological priors are used to extract key points and their descriptors, and a matching stage that employs a 3D spatial-aware hybrid attention mechanism to capture vessel structures while mitigating the impact of vessel overlap and branch missing on feature-based matching.We also designed a data simulation strategy to generate a large set of paired data for network training, using various rigid transformations and random branch trimming to simulate complex and variable real-world scenarios, especially the vessel overlap and branch missing. Extensive evaluations conducted on the simulated dataset and 1,016 pairs of real CTA and DSA samples demonstrate the effectiveness and robustness of our method, highlighting its strong performance and potential for real-world surgical applications. The code is available at {https://github.com/xxsxxsxxs666/2D-3DCoronary}. Hide abstract
168 - PathTTT: Test-Time Training with Meta-Auxiliary Learning for Pathology Image Classification Haoyu He, Mahdi Hosseini, Yang Wang
- abstract
Detecting cancer through pathology imaging is crucial for early diagnosis and effective treatment, yet domain shifts between training and test data often compromise the reliability of deep learning (DL) models. These domain shifts are common in real-world environments due to variations in imaging devices, protocols, and patient populations. To address this challenge, we propose PathTTT, a novel framework that combines Test-Time Training (TTT) with Model-Agnostic Meta-Learning (MAML) to enhance model robustness under domain shift conditions. Our experimental results demonstrate significant improvements over state-of-the-art methods on benchmark datasets, highlighting PathTTT’s potential for robust cancer detection in real-world settings. Hide abstract
174 - A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text? Julio Silva-Rodríguez, Jose Dolz, Ismail Ben Ayed
- abstract
Vision language pre-training has recently gained popularity as it allows learning rich feature representations using large-scale data sources. This paradigm has quickly made its way into the medical image analysis community. In particular, there is an impressive amount of recent literature developing vision-language models for radiology. However, the available medical datasets with image-text supervision are scarce, and medical concepts are fine-grained, involving expert knowledge that existing vision-language models struggle to encode. In this paper, we propose to take a prudent step back from the literature and revisit supervised, unimodal pre-training, using fine-grained labels instead. We conduct an extensive comparison demonstrating that unimodal pre-training is highly competitive and better suited to integrating heterogeneous data sources. Our results also question the potential of recent vision-language models for open-vocabulary generalization, which have been evaluated using optimistic experimental settings. Finally, we study novel alternatives to better integrate fine-grained labels and noisy text supervision. Hide abstract
180 - Enhancing Alzheimer's Diagnosis: Leveraging Anatomical Landmarks in Graph Convolutional Neural Networks on Hippocampal Tetrahedral Meshes Yanxi Chen, Momammad Farazi, Zhangsihao Yang, Yonghui Fan, Yi Su, Eric Reiman, Nicholas Ashton, Yalin Wang
- abstract
Alzheimer’s disease (AD) is a major neurodegenerative condition that affects millions around the world. As one of the main biomarkers in the AD diagnosis procedure, brain amyloid positivity is typically identified by positron emission tomography (PET), which is costly and invasive. Brain structural magnetic resonance imaging (sMRI) may provide a safer and more convenient solution for the AD diagnosis. Recent advances in geometric deep learning have facilitated sMRI analysis and early diagnosis of AD. However, determining AD pathology, such as brain amyloid deposition, in preclinical stage remains challenging, as less significant morphological changes can be observed. As a result, few AD classification models are generalizable to the brain amyloid positivity classification task. Blood-based biomarkers (BBBMs), on the other hand, have recently achieved remarkable success in predicting brain amyloid positivity and identifying individuals with high risk of being brain amyloid positive. However, individuals in medium risk group still require gold standard tests such as Amyloid PET for further evaluation. Inspired by the recent success of transformer architectures, we propose a geometric deep learning model based on transformer that is both scalable and robust to variations in input volumetric mesh size. Our work introduced a novel tokenization scheme for tetrahedral meshes, incorporating anatomical landmarks generated by a pre-trained Gaussian process model. Our model achieved superior classification performance in AD classification task. In addition, we showed that the model was also generalizable to the brain amyloid positivity classification for pre-clinical AD analysis, especially with individuals in the medium risk class, where BBBM alone cannot achieve a clear classification. Our work may enrich geometric deep learning research and improve AD diagnosis accuracy without using expensive and invasive PET scans. Hide abstract
182 - Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction Joseph Paillard, Antoine Collas, Denis Engemann, Bertrand Thirion
- abstract
Recent advances in machine learning have greatly expanded the repertoire of predictive methods for medical imaging. However, the interpretability of complex models remains a challenge, which limits their utility in medical applications. Recently, model-agnostic methods have been proposed to measure conditional variable importance and accommodate complex non-linear models. However, they often lack power when dealing with highly correlated data, a common problem in medical imaging. We introduce Hierarchical-CPI, a model-agnostic variable importance measure that frames the inference problem as the discovery of groups of variables that are jointly predictive of the outcome. By exploring subgroups along a hierarchical tree, it remains computationally tractable, yet also enjoys explicit family-wise error rate control. Moreover, we address the issue of vanishing conditional importance under high correlation with a tree-based importance allocation mechanism. We benchmarked Hierarchical-CPI against state-of-the-art variable importance methods. Its effectiveness is demonstrated in two neuroimaging datasets: classifying dementia diagnoses from MRI data (ADNI dataset) and analyzing the Berger effect on EEG data (TDBRAIN dataset), identifying biologically plausible variables. Hide abstract
204 - Unsupervised Deformable Image Registration with Structural Nonparametric Smoothing Hang Zhang, Xiang Chen, Renjiu Hu, Rongguang Wang, Jinwei Zhang, Min Liu, Yaonan Wang, Gaolei Li, Xinxing Cheng, Jinming Duan , Study group leader: Miaomiao Zhang
- abstract
Learning-based deformable image registration (DIR) accelerates alignment by amortizing traditional optimization via neural networks.Label supervision further enhances accuracy, enabling efficient and precise nonlinear alignment of unseen scans.However, images with sparse features amid large smooth regions, such as retinal vessels, introduce aperture and large-displacement challenges that unsupervised DIR methods struggle to address.This limitation occurs because neural networks predict deformation fields in a single forward pass, leaving fields unconstrained post-training and shifting the regularization burden entirely to network weights.To address these issues, we introduce SmoothProper, a plug-and-play neural module enforcing smoothness and promoting message passing within the network's forward pass.By integrating a duality-based optimization layer with tailored interaction terms, SmoothProper efficiently propagates flow signals across spatial locations, enforces smoothness, and preserves structural consistency.It is model-agnostic, seamlessly integrates into existing registration frameworks with minimal parameter overhead, and eliminates regularizer hyperparameter tuning.Preliminary results on a retinal vessel dataset exhibiting aperture and large-displacement challenges demonstrate our method reduces registration error to 1.88 pixels on 2912x2912 images, marking the first unsupervised DIR approach to effectively address both challenges.The source code will be available at https://github.com/tinymilky/SmoothProper.
Hide abstract
221 - Subspace Implicit Neural Representations for Real-Time Cardiac Cine MR Imaging Wenqi Huang, Veronika Spieker, Siying Xu, Gastao Cruz, Claudia Prieto, Julia Schnabel, Kerstin Hammernik, Daniel Rueckert
- abstract
Conventional cardiac cine MRI methods rely on retrospective gating, which limits temporal resolution and the ability to capture continuous cardiac dynamics, particularly in patients with arrhythmias and beat-to-beat variations. To address these challenges, we propose a reconstruction framework based on subspace implicit neural representations (INRs) for real-time cardiac cine MRI of continuously sampled radial data. This approach employs two multilayer perceptrons (MLPs) to learn spatial and temporal subspace bases, leveraging the low-rank properties of cardiac cine MRI. Initialized with low-resolution reconstructions, the networks are fine-tuned using spoke-specific loss functions to recover spatial details and temporal fidelity. Our method directly utilizes the continuously sampled radial k-space spokes during training, thereby eliminating the need for binning and non-uniform FFT. This approach achieves superior spatial and temporal image quality compared to conventional binned methods at the acceleration rate of 10 and 20, demonstrating potential for high-resolution imaging of dynamic cardiac events and enhancing diagnostic capability. Hide abstract