MICCAI 2025 | MLLMCP

About

Multimodal Large Language Models in Clinical Practice

Welcome to the 1^st Workshop on MLLMs in Clinical Practice co-located with MICCAI 2025!

Recent advancements in medical multimodal large language models (MLLMs), such as MedGemini, have introduced a transformative era in clinical AI, enabling the integration of various data modalities like 2D/3D medical images, text, and DNA sequences for more comprehensive diagnostics and personalized care. While these models show promise, challenges such as data scarcity, privacy concerns, and the need for more comprehensive evaluation metrics beyond accuracy must be addressed to fully realize their potential. MLLMs also offer exciting opportunities for enhanced human-AI collaboration in clinical workflows, improving diagnostic accuracy and decision-making. To facilitate research in this emerging field, we propose a workshop to foster discussion and collaboration on MLLM development and address the challenges of leveraging these models in clinical practice. The workshop theme includes topics but not limited to dataset construction, safety, fairness, human-AI collaboration, and new evaluation metrics for clinical MLLMs.

Topics of interests include but not limited to:

Multimodal Large Language Models (MLLMs) for Healthcare
Large-scale Dataset Construction
Evaluation Metrics for MLLMs
Safety, fairness, and Risks in Deployment
Human-AI Collaboration in Diagnosis and Treatment

Workshop Schedule

Program

September 23, 2025 - South Korea (Daejeon) local time (GMT+9)

08:50 - 09:00 | Opening Remarks
09:00 - 09:30 | Invited Talk I by Professor Marinka Zitnik Title: Empowering Biomedical Discovery with “AI Scientists”
09:30 - 10:00 | Invited Talk II by Dr Yun Liu Title: Machine Learning and Generative AI for Biomedical Applications
10:00 - 10:30 | Coffee Break
10:30 - 11:00 | Oral Presentations I (2)
- RadFig-VQA: A Multi-Imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
  Yosuke Yamagishi, Shouhei Hanaoka, Yuta Nakamura, Tomohiro Kikuchi, Akinobu Shimizu, Takeharu Yoshikawa, Osamu Abe
- Trustworthy clinical thinking in MLLMs: Hierarchical Energy-based Reasoning for interpretable MEdical Scans (HERMES)
  Florence Xini Doo, Huiwen Han, Bradley A. Maron, Heng Huang
11:00 - 11:30 | Invited Talk III by Dr Hoifung Poon Title: Multimodal Generative AI for Precision Health
11:30 - 12:00 | Oral Presentations II (2)
- Beyond One Size Fits All: Customization of Radiology Report Generation Methods
  Tom van Sonsbeek, Arnaud Arindra Adiyoso Setio, Jung-Oh Lee, Junwoo Cho, Junha Kim, Hyeonsoo Lee, Gunhee Nam, Laurent Dillard, Tae Soo Kim
- MedDual: A Practical Dual-Decoding Framework for Mitigating Hallucinations in Medical Vision-Language Models
  Zhe Zhang, Daisong Gan, Zhaochi Wen, Dong Liang
12:00 - 12:30 | 3 Minute Lightning Talks for Poster Boaster
12:30 - 13:30 | Lunch Break
13:30 - 14:00 | Invited Talk IV by Professor Hao Chen Title: Harnessing Large AI Models for Transforming Healthcare
14:00 - 15:30 | Panel Discussion: Human–AI Collaboration in Clinical Workflows
15:30 - 16:00 | Coffee Break
16:00 - 17:00 | Poster Session
17:00 - 17:30 | Invited Talk V by Dr Valentina Salvatelli Title: From Raw Data to Real Impact: Training, Evaluating, and Integrating a Multimodal Radiology Assistant at Medical Center Scale
17:30 - 17:40 | Closing Remarks

Papers

Accepted Papers

Oral Presentations:

RadFig-VQA: A Multi-Imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
Yosuke Yamagishi, Shouhei Hanaoka, Yuta Nakamura, Tomohiro Kikuchi, Akinobu Shimizu, Takeharu Yoshikawa, Osamu Abe
Trustworthy clinical thinking in MLLMs: Hierarchical Energy-based Reasoning for interpretable MEdical Scans (HERMES)
Florence Xini Doo, Huiwen Han, Bradley A. Maron, Heng Huang
Beyond One Size Fits All: Customization of Radiology Report Generation Methods
Tom van Sonsbeek, Arnaud Arindra Adiyoso Setio, Jung-Oh Lee, Junwoo Cho, Junha Kim, Hyeonsoo Lee, Gunhee Nam, Laurent Dillard, Tae Soo Kim
MedDual: A Practical Dual-Decoding Framework for Mitigating Hallucinations in Medical Vision-Language Models
Zhe Zhang, Daisong Gan, Zhaochi Wen, Dong Liang
Note: Oral presentation will be 15 minutes including Q&A.
Note: All oral presentation papers will also have poster boards during the poster session.
Note: Your printed poster must be in PORTRAIT format and not exceed A0 size (33.1 in x 46.8 in or 841 mm x 1189 mm).

Poster Presentations:

MedM-VL: What Makes a Good Medical LVLM?
Yiming Shi, Shaoshuai Yang, Xun Zhu, Haoyu Wang, Xiangling Fu, Miao Li, Ji Wu
KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations
Junyu Liu, Lawrence K.Q. Yan, Tianyang Wang, Qian Niu, Momoko Nagai-Tanima, Tomoki Aoyama
Predictive Multimodal Modeling of Diagnoses and Treatments in EHR
Cindy Shih-Ting Huang, Ng Boon Liang Clarence, Marek Rei
Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework
Zhaolong Wu, Luo Pu, Jason Pui-Yin Cheung, Teng Zhang
Aligning Multimodal Large Language Models with Patient-Physician Dialogues for AI-Assisted Clinical Support
Junyong Lee, Jeihee Cho, Jiwon Ryu, Shiho Kim
DuEL-Med: A Dual-path Enhanced Language Model for Clinically-Aware Radiology Report Generation
Jin Kim, Dan Ruan, Matthew Sherman Brown
On the risk of misleading reports: diagnosing textual biases in Multmodal Clinical AI
David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante
Note: All other accepted papers will have poster boards during the poster session.
Note: All poster presentation papers will have 3 minute lightning talks for poster boaster.
Note: The lighning talk should be only 1 page without Q&A session.
Note: Your printed poster must be in PORTRAIT format and not exceed A0 size (33.1 in x 46.8 in or 841 mm x 1189 mm).