Volume 12 - Year 2025 - Pages 112-120
DOI: 10.11159/jbeb.2025.014
CellSynth: Bridging Data Scarcity in Biomedical Imaging with Diffusion-Synthesized Cell Data
Henry Hou
The Bishop’s School
7607 La Jolla Blvd, La Jolla, CA 92037, USA
henry.hou.27@bishops.com
Abstract - Recent advancements in deep learning have significantly improved cell segmentation, with foundation models (FMs) emerging as a promising approach. Among these, CellViT, a model based on the Vision Transformer (ViT) architecture, has demonstrated strong generalization capabilities across diverse cell imaging modalities and datasets. However, despite its potential, CellViT's performance remains suboptimal on challenging segmentation tasks. In this study, the accuracy and robustness of CellViT are evaluated on the publicly available MoNuSeg dataset, a benchmark for cell segmentation. To enhance CellViT’s performance, a fine-tuning framework is proposed that integrates both manually labeled cell images with precise annotations and synthetic images generated via a hierarchical diffusion model (DiffInfinite). The pipeline incorporates three key components: foundation model fine-tuning, human-in-the-loop feedback for synthetic image selection, and training on a combined dataset of real annotated and pseudo-labeled synthetic images. Experimental results demonstrate that the proposed method leads to significant improvements in segmentation accuracy and generalization, particularly in domains where labeled data is scarce. This study underscores the potential of fine-tuning foundation models with synthetic data augmentation, providing a scalable approach for enhancing biomedical image analysis. The findings pave the way for the development of more robust and precise segmentation models, with critical applications in disease diagnostics and biomedical research.
Keywords: Cell segmentation, Foundation models, Diffusion models, Synthetic data, Biomedical imaging
© Copyright 2025 Authors This is an Open Access article published under the Creative Commons Attribution License terms. Unrestricted use, distribution, and reproduction in any medium are permitted, provided the original work is properly cited.
Date Received: 2025-01-24
Date Revised: 2025-08-18
Date Accepted: 2025-09-10
Date Published: 2025-12-05
1. Introduction
Machine learning, particularly deep learning, has rapidly expanded into the medical field, significantly impacting medical image analysis [1]. The accurate segmentation of medical images is crucial for various clinical applications, including disease diagnosis and treatment monitoring [2]. However, developing robust segmentation models faces several challenges [3]. Manual segmentation, the gold standard for creating training data, is highly labor-intensive and requires domain expertise, making it expensive. Furthermore, the availability of high-quality annotated medical images is often limited due to patient privacy regulations and the inherent complexity of medical data. Consequently, existing segmentation models frequently struggle with accuracy and generalization due to data scarcity. Foundation models (FMs) represent a promising direction in addressing these challenges [4]. Recent advancements such as Segment Anything [5] and its medical adaptation, SAM-Med [6], demonstrate the growing potential of universal segmentation models across domains, highlighting the importance of developing robust and generalizable solutions for clinical applications. CellViT, a Vision Transformer (ViT)-based FM trained on manually segmented images from the Cancer Genome Atlas (TCGA) [3], [7], has shown capabilities in segmenting cell images, identifying cell states, and recognizing tissue types [3]. Despite its strengths, the performance of CellViT and similar models still requires improvement to meet the demands of clinical adoption. This study aims to enhance the performance of CellViT through a novel fine-tuning strategy. Specifically, it was investigated how integrating synthetic data generated by a diffusion model (DiffInfinite) [8] can improve CellViT’s segmentation accuracy on the MoNuSeg dataset [9]. A human-in-the-loop approach was explored for selecting high-quality synthetic images and assessing the impact of augmenting the training data with these pseudo-labeled synthetic images on model accuracy and potential clinical applicability. The central research question is how to effectively leverage synthetic data to overcome the limitations imposed by scarce labeled medical imaging data and enhance CellViT’s segmentation capabilities. This work is novel in its introduction of a hybrid fine-tuning pipeline that bridges diffusion-based synthetic data generation with foundation model adaptation, representing one of the first demonstrations of combining CellViT with a hierarchical diffusion model for cell segmentation enhancement. Recent studies further contextualize and motivate this work. Large-scale medical foundation-model efforts and benchmarks (e.g., MedSegBench) have highlighted the need for standardized evaluation across modalities and the value of transfer learning for segmentation tasks. Advances in diffusion-based synthesis tailored for histopathology—such as PathLDM (text-conditioned latent diffusion for pathology), optimizations for diffusion models specific to histopathology [10], and comprehensive synthetic-data evaluation pipelines—demonstrate improving realism and utility of generated tissue images. Concurrently, new foundation-model architectures and multimodal medical models (e.g., M4oE) and fairness and benchmarking efforts (e.g., FairMedFM) emphasize robust, generalizable performance across domains and expose the critical role of diverse, high-quality training data. Through combining foundation-model adaptation (CellViT) with histopathology-specific diffusion synthesis (DiffInfinite) and curated human-in-the-loop selection to improve segmentation where labeled data are scarce, synthetic data can be used to attempt to improve the performance of segmentation models.
2. Related Work
2.1 Background
Medical image segmentation plays a pivotal role in biomedical research and clinical diagnostics by enabling the accurate delineation of cellular structures and pathological regions. It supports disease detection, histopathological evaluation, and treatment planning [3], [11]. Traditional approaches, including thresholding and watershed algorithms, offered acceptable results in controlled settings but struggled with complex, real-world histological data due to overlapping structures and staining variability.
The advent of deep learning revolutionized segmentation. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) learn hierarchical and spatial features, providing superior generalization across imaging modalities [4], [12], [13]. Yet, these models remain constrained by limited annotated datasets, which are costly to produce and often restricted by privacy regulations.
2.2 Synthetic Data Generation for Medical Imaging
To mitigate data scarcity, synthetic image generation has become an active area of research. Generative Adversarial Networks (GANs) and Diffusion Models (DMs) are particularly promising. While GANs synthesize plausible images, Diffusion Models outperform them in fidelity and stability [14], [15]. DiffInfinite, a hierarchical diffusion model, is designed to generate high-resolution histopathological images with corresponding masks, reducing reliance on manual annotation [8]. These synthetic images augment real datasets and support training robust segmentation models, though concerns about data memorization and generalizability persist. Recent studies apply information-theoretic metrics to evaluate the authenticity and diversity of generated images [11], [13].
2.3 CellViT for Histopathological Image Segmentation
CellViT, a transformer-based segmentation model, leverages self-attention to capture long-range dependencies in cellular structures [7]. While it performs well across multiple datasets, its accuracy is hindered by insufficiently labeled data. Integrating synthetic data from models like DiffInfinite can improve training diversity and generalization, addressing one of the key limitations in histological segmentation.
Current studies further advance foundation models, diffusion synthesis, and benchmarking for medical imaging. Notably, comprehensive reviews and position pieces on foundation models in medical imaging and radiology [16] outline best practices for clinical adaptation and cross-institutional validation. Novel diffusion-based generators for histopathology (e.g., TopoCellGen and pathology-informed latent diffusion approaches) improve topological realism and task-specific fidelity for nuclei and tissue synthesis. Additionally, emerging work on super-resolution diffusion (SuperDiff) and synthetic-data curation for self-supervised learning demonstrate concrete pipelines for improving downstream segmentation with generated data. These newer contributions underscore the timeliness of combining foundation-model fine-tuning (CellViT) with histopathology-specific diffusion synthesis in this manuscript.
Building upon prior developments, recent studies have significantly advanced foundation models, diffusion synthesis, and benchmarking for medical imaging. Reviews such as Ryu et al. [17] provide a comprehensive overview of vision-language foundation models, emphasizing cross-modality generalization and clinical robustness. Diffusion-based generation for histopathology has progressed through works like Xu et al. [18], which introduces topological constraints for realistic cell-level generation, and the SuperDiff framework, which enhances image fidelity and resolution for digital pathology. Diffusion-driven super-resolution for MRI by Sun et al. [19] and temporal consistency frameworks such as Zhou et al. [20] extend generative modeling to spatiotemporal domains. Together, these contributions underscore the timeliness of integrating foundation-model fine-tuning (CellViT) with histopathology-specific diffusion synthesis and human-in-the-loop selection to improve segmentation where labeled data remain scarce.
3. Methodology
This approach involves fine-tuning the pre-trained CellViT foundation model using different strategies, including the integration of synthetic data.
3.1. Baseline Fine-Tuning (Basic Method)
The basic fine-tuning method utilizes a labeled dataset:
Where Ii is a raw H&E-stained cell image and Mi is its corresponding instance-level cell mask.
The fine-tuning procedure is as follows:
- Initialize CellViT with its pre-trained weights.
- Optimize the model parameters using a combined Dice loss (LDice) for segmentation accuracy and Cross-entropy loss (LCE) for classification consistency. The loss function is
Where λ1 and λ2 are weighting coefficients.
- Update the model weights using backpropagation until the weights converge. This process yields a CellViT model fine-tuned specifically for dataset D.
3.2. Fine-Tuning with Synthetic Data (Advanced Method)
This method extends the basic fine-tuning by incorporating synthetic data generated by the DiffInfinite model.
- Synthetic Data Generation: Use DiffInfinite to generate synthetic H&E images:
Ij′~DiffInfinite(z), z~N (0,I)
where Ij′ represents a newly generated synthetic image.
- Pseudo-Labeling: Use the baseline CellViT model to generate initial segmentation masks Mj′ for the synthetic images:
- Human-in-the-Loop Verification: Human experts review the generated pairs (Ij′, Mj′) and select high-quality samples for augmentation.
- Augmented Dataset Construction: Combine the original labeled dataset D with the verified synthetic pairs to create an expanded dataset:
- Fine-Tuning on Augmented Data: Fine-tune CellViT on the augmented dataset D′ using the same loss function and optimization process as the basic method.
Key technical considerations for the advanced method include constructing training batches with a specific probability ratio for sampling real versus synthetic images to prevent overfitting to synthetic data, using a relatively small learning rate to ensure stable convergence, and maximizing the batch size within GPU memory limits. This approach aims to leverage the increased diversity and size of the training data to further improve segmentation performance and reliability. The overall pipeline comparing no fine-tuning, basic fine-tuning, and advanced fine-tuning is visualized below in Figure 1.
Experiments and Results
This experimental setup utilized the MoNuSeg dataset for both training and testing, which included 30 labeled training images and 14 high-resolution test images. To augment the training data, 28 synthetic images generated by DiffInfinite were incorporated, with pseudo-segmentation labels provided by the baseline CellViT model. Performance was evaluated using binary Panoptic Quality (bPQ), which combines detection and segmentation quality, and Recall, calculated as the ratio of detected true positives to all positive samples, emphasizing the importance of identifying all relevant cells. The core models employed were CellViT as the foundation model for segmentation and DiffInfinite as the generative model for synthetic data creation. Fine-tuning was conducted over 30 epochs with a learning rate of 1e-6 and a batch size of 8, using a random sampling probability of 0.6 to balance the contribution of real MoNuSeg data and synthetic DiffInfinite data during training. In addition to segmentation accuracy, the realism and variability of synthetic data generated by DiffInfinite were quantitatively assessed using the Fréchet Inception Distance (FID) and Diversity Index (DI). Lower FID scores (15.7) indicate high fidelity of generated images relative to real samples, while a DI value of 2.7 reflects strong diversity across generated nuclei structures. These metrics objectively validate the image quality and reinforce the reliability of the synthetic data used for training augmentation.
4.1 Technical Considerations and Setups Loss Function Adjustment for Fine-Tuning
During the fine-tuning process, the loss function was adjusted to accommodate the differences between the PanNuke and MoNuSeg datasets. The PanNuke dataset includes both tissue type and cell type annotations, which the original CellViT training pipeline is designed to handle. However, the MoNuSeg dataset lacks these categorical distinctions, necessitating modifications to the training pipeline to eliminate the dependency on these classifications. The loss function equation for the model is:
Here, the NT (nuclei type) and TC (tissue class) loss variables were evaluated, which contributed to the overall loss function for nuclei classification and tissue type identification. After identifying the corresponding lambda values, they were set to zero, effectively removing the influence of tissue and cell-type classification and ensuring the model focused on segmentation tasks relevant to MoNuSeg.
Baseline Establishment
The MoNuSeg dataset was chosen as the benchmark for evaluating segmentation performance. Initial performance was assessed using the MoNuSeg test set, providing a baseline against which improvements could be measured. Since CellViT was originally trained on the PanNuke dataset, this baseline allowed for a comparison of performance across different fine-tuning strategies.
Fine-Tuning with MoNuSeg Training Set
All MoNuSeg images were resized, and their segmentation masks were reformatted before fine-tuning. The CellViT model was fine-tuned using the MoNuSeg training set, and performance was re-evaluated on the MoNuSeg test set.
Fine-Tuning with MoNuSeg + DiffInfinite Generated Dataset
The DiffInfinite model was used to generate synthetic images, and the data was generated by solving a diffusion-based equation.
Image Generation and Augmentation
To overcome the limitations of available training data, DiffInfinite was utilized to generate 400 synthetic histopathological images at a resolution of 512x512 pixels with 20x magnification. The MoNuSeg images were resized to match this resolution, ensuring consistency between synthetic and real datasets for model training.
Human-in-the-Loop Process
The DiffInfinite-generated images that most closely resembled MoNuSeg images were manually reviewed and selected, prioritizing those where the model captured a higher proportion of nuclei. These selected images were then segmented by CellViT, generating high-quality, pseudo-segmented images that augmented the training dataset.
Fine-Tuning with Augmented Dataset
The best pseudo-segmented images were integrated with the MoNuSeg dataset to create a hybrid MoNuSeg + DiffInfinite dataset. This augmented dataset was used for fine-tuning CellViT, improving segmentation accuracy by exposing the model to both real and synthetic data.
Random Sampling (Bootstrapped Augmentation)
To prevent overfitting to synthetic data, bootstrapped augmentation was implemented via random sampling. This ensured a balanced ratio of real and synthetic data during training, preserving generalizability while leveraging synthetic diversity.
Resizing and Preprocessing
All images in the MoNuSeg dataset were resized to 512x512 pixels to match the DiffInfinite-generated images. Additionally, MoNuSeg’s segmentation masks were reformatted for compatibility with CellViT.
Loss Function Adjustment for Fine-Tuning
Further adjustments were made to the loss function to accommodate MoNuSeg's lack of tissue and cell-type annotations, ensuring the model focused on segmentation tasks without relying on these classifications.
4.2 Results & Analysis
As illustrated in Figures 2 and 3, a noticeable improvement was observed in both Panoptic Quality (bPQ) and recall scores when the foundation model was fine-tuned using MoNuSeg samples. These measures were mainly picked as they ensured that the model is more confident in its segmentations, allowing it to segment more cells. Through this, even with more false positives segmented, the model can catch more cells that it may have missed before. This enhancement demonstrates the benefit of domain-specific adaptation. Furthermore, an additional boost in performance was observed after incorporating synthetic samples generated from DiffInfinite, which likely provides additional structural diversity and improved generalization. Overall, the results indicate a total improvement of 4% in PQ and a significant 6.9% increase in recall, highlighting the effectiveness of fine-tuning with both real and synthetic data.
Figure 4 presents qualitative segmentation results from the test set, showcasing the model’s performance in real-world scenarios. Upon visual inspection, a high degree of overlap was observed between the predicted segmentation and the corresponding ground truth. This suggests that the model can accurately delineate object boundaries, maintaining fidelity to the ground truth labels. The strong qualitative agreement further supports the quantitative gains observed in segmentation performance.
As depicted in Figure 5, the effect of altering the uniform sampling probability between synthetic and real data was examined. The findings reveal that as the probability of selecting real data decreases, the model becomes increasingly reliant on synthetic samples, which negatively impacts overall performance. Conversely, when the probability of selecting real data is increased, fewer synthetic samples contribute to the training process, thereby diminishing their beneficial effect.
Through systematic experimentation, it was identified that an optimal sampling probability of 0.6 strikes the best balance between leveraging synthetic diversity and preserving real-data fidelity. This effect is largely due to the distillation process, where the same model is used as both the student and the teacher. Since the model learns from its own synthetic outputs, excessive reliance on synthetic data leads to compounding errors, while insufficient synthetic sampling limits the benefits of augmented diversity. This phenomenon aligns with known limitations of self-distillation, where the same model acts as both teacher and student. Such loops can amplify minor prediction biases, leading to cumulative degradation in segmentation accuracy over multiple iterations. Future research may mitigate this through external teacher models or uncertainty-aware sampling. Moreover, the bootstrapped augmentation strategy introduces a potential limitation arising from self-distillation loops, where the teacher and student models share identical architectures. Such recursive learning may propagate minor prediction errors over time, reducing generalization in downstream clinical datasets. Future work could address this by incorporating ensemble-based teacher models, uncertainty-aware sampling, or periodic external validation to reduce compounding bias.
Figure 6 illustrates the impact of varying batch sizes on model performance. Different batch sizes were experimented with, and a batch size of 8 yielded the most optimal results were observed. Smaller batch sizes resulted in noisier gradient updates, leading to unstable training, whereas larger batch sizes appeared to reduce the model’s ability to generalize well. The batch size of 8 provided the best trade-off between stability and generalization, ensuring efficient learning.
As shown in Figure 7, the effect of adjusting the learning rate on model performance was explored. The analysis indicates that different learning rates lead to varying degrees of convergence efficiency and final performance. A learning rate that is too high results in unstable updates, while a rate that is too low leads to slow convergence. Through systematic evaluation, it was determined that a learning rate of 5e-6 produces the best results, facilitating stable and efficient learning while preventing overfitting or underfitting.
5. Conclusion
This study confirms that synthetic data generated by DiffInfinite significantly enhances the performance of the CellViT segmentation model. The integration of synthetic data addresses the challenge of limited medical imaging datasets and demonstrates the potential of generative models in improving deep learning performance for medical image analysis. The findings suggest that incorporating synthetic data into training pipelines offers a scalable solution to data scarcity, paving the way for more robust AI-driven diagnostic tools. Beyond technical improvements, the framework holds strong translational potential for clinical pathology and early cancer screening, where precise cell segmentation aids in disease grading, tumor boundary delineation, and diagnostic decision support. The proposed framework demonstrates tangible potential for clinical translation—particularly in diagnostic pathology, cancer screening, and digital histopathology systems. By enabling more consistent and scalable segmentation of nuclei and tissue structures, the methodology can be applied in computational pathology pipelines for tumor grading, biomarker quantification, and disease progression monitoring. Future integration with multimodal clinical imaging and federated learning could further extend the impact of this research across institutions. The proposed pipeline not only advances segmentation performance but also demonstrates strong translational potential in real-world clinical settings. In diagnostic pathology, the enhanced segmentation accuracy can support automated tumor grading, cellular morphology analysis, and biomarker quantification. In cancer screening and early detection workflows, this method can enable scalable, accurate nuclei delineation critical for disease prognosis. By bridging data scarcity with synthetic augmentation, the framework aligns with the ongoing transition toward AI-assisted digital pathology and precision diagnostics.
Acknowledgement
I would like to acknowledge the invaluable guidance and support of Mr. Darren Cameron, Prof. Zhuowen Tu, Dr. Shubhankar Bores, and Prof. Yizhe Zhang for their insightful advice, encouragement, and inspiration.
References
[1] G. Campanella, M. G. Hanna, L. Geneslaw, A. M. Miraflor, V. W. Krauss Silva, J. J. Busam, T. J. Brogi, D. L. Reuter, V. S. Klimstra, and T. J. Fuchs, "Clinical-grade computational pathology using weakly supervised deep learning on whole slide images," Nature Medicine, vol. 25, no. 8, pp. 1301-1309, 2019. Available: View Article
[2] O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional networks for biomedical image segmentation," in Proc. Int. Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI), vol. 9351, pp. 234-241, 2015. Available: View Article
[3] G. Litjens, T. Kooi, B. Ehteshami Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. Sánchez, "A survey on deep learning in medical image analysis," Medical Image Analysis, vol. 42, pp. 60-88, 2017. Available: View Article
[4] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, "Training data-efficient image transformers & distillation through attention," in Proc. Int. Conf. Machine Learning (ICML), vol. 139, pp. 10347-10357, 2021.
[5] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Y. Lo, and R. Girshick, "Segment anything," arXiv preprint, arXiv:2304.02643, 2023. Available: View Article
[6] J. Ma, S. Y. Kim, F. Li, M. Baharoon, R. Asakereh, H. Lyu, and B. Wang, "Segment anything in medical images and videos: Benchmark and deployment," arXiv preprint, arXiv:2408.03322, 2024.
[7] F. Hörst, M. Rempe, L. Heine, M. D. S. R. Oliveira, A. S. M. Santos, and T. M. Deserno, "CellViT: Vision transformers for precise cell segmentation and classification," Medical Image Analysis, vol. 94, p. 103143, 2024. Available: View Article
[8] M. Aversa, G. Nobis, M. Hägele, K. Standvoss, M. Chirica, R. Murray-Smith, A. M. Alaa, L. Ruff, D. Ivanova, W. Samek, F. Klauschen, B. Sanguinetti, and L. Oala, "DiffInfinite: Large mask-image synthesis via parallel random patch diffusion in histopathology," in Advances in Neural Information Processing Systems (NeurIPS), vol. 36, pp. 78126-78141, 2023.
[9] N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane, and A. Sethi, "A multi-organ nucleus segmentation challenge," IEEE Transactions on Medical Imaging, vol. 39, no. 5, pp. 1380-1391, 2020.
[10] V. Porter, J. Smith, L. Brown, K. Patel, and R. Andrews, "Optimising diffusion models for histopathology images," in Proc. British Machine Vision Conference (BMVC), 2024.
[11] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, "Diffusion models: A comprehensive survey of methods and applications," ACM Computing Surveys, vol. 56, no. 4, pp. 1-39, 2023. Available: View Article
[12] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16×16 words: Transformers for image recognition at scale," in Proc. Int. Conf. Learning Representations (ICLR), 2021.
[13] A. B. Abdusalomov, R. Nasimov, N. Nasimova, B. Muminov, and T. K. Whangbo, "Evaluating synthetic medical images using artificial intelligence with the GAN algorithm," Sensors, vol. 23, no. 7, p. 3440, 2023. Available: View Article
[14] J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 6840-6851, 2020.
[15] Genomic Data Commons Data Portal. [Online]. Available: View Article
[16] V. van Veldhuizen, V. Botha, C. Lu, M. E. Cesur, K. G. Lipman, E. D. de Jong, H. Horlings, C. I. Sanchez, C. G. M. Snoek, L. Wessels, R. Mann, E. Marcus and J. Teuwen, "Foundation Models in Medical Imaging -- a Review and Roadmap," arXiv preprint, 2025.
[17] J. S. Ryu, H. Kang, Y. Chu, S. Yang, D. Lee, and K. Kim, "Vision-language foundation models for medical imaging: a review of current practices and innovations," Biomedical Engineering Letters, vol. 15, no. 5, pp. 809-830, 2025. Available: View Article
[18] X. Xu, S. Kapse and P. Prasanna, "SuperDiff: A diffusion super-resolution method for digital pathology with comprehensive quality assessment," Computers in Biology and Medicine, 2025. Available: View Article
[19] C. Sun, N. Goyal, Y. Wang, D. L. Tharp, S. Kumar, and T. A. Altes, "Conditional diffusion-generated super-resolution for myocardial perfusion MRI," Frontiers in Cardiovascular Medicine, vol. 12, 2025. Available: View Article
[20] X. Zhou, J. Liu, S. Yu, H. Yang, C. Li, T. Tan, and S. Wang, "A Diffusion-Driven Temporal Super-Resolution and Spatial Consistency Enhancement Framework for 4D MRI imaging," in Proc. MICCAI, LNCS vol. 15969, pp. 3-12, 2025. Available: View Article