About
Multimodal Large Language Models in Clinical Practice
Welcome to the 1st Workshop on MLLMs in Clinical Practice co-located with MICCAI 2025!
Recent advancements in medical multimodal large language models (MLLMs), such as MedGemini, have introduced a transformative era in clinical AI, enabling the integration of various data modalities like 2D/3D medical images, text, and DNA sequences for more comprehensive diagnostics and personalized care. While these models show promise, challenges such as data scarcity, privacy concerns, and the need for more comprehensive evaluation metrics beyond accuracy must be addressed to fully realize their potential. MLLMs also offer exciting opportunities for enhanced human-AI collaboration in clinical workflows, improving diagnostic accuracy and decision-making. To facilitate research in this emerging field, we propose a workshop to foster discussion and collaboration on MLLM development and address the challenges of leveraging these models in clinical practice. The workshop theme includes topics but not limited to dataset construction, safety, fairness, human-AI collaboration, and new evaluation metrics for clinical MLLMs.
Topics of interests include but not limited to:
- Multimodal Large Language Models (MLLMs) for Healthcare
- Large-scale Dataset Construction
- Evaluation Metrics for MLLMs
- Safety, fairness, and Risks in Deployment
- Human-AI Collaboration in Diagnosis and Treatment
Workshop Schedule
Program
September 23, 2025 - South Korea (Daejeon) local time (GMT+9)
- 08:50 - 09:00 | Opening Remarks
- 09:00 - 09:30 | Invited Talk I by Professor Marinka Zitnik Title: Empowering Biomedical Discovery with “AI Scientists”
- 09:30 - 10:00 | Invited Talk II by Dr Yun Liu Title: Machine Learning and Generative AI for Biomedical Applications
- 10:00 - 10:30 | Coffee Break
- 10:30 - 11:00 | Oral Presentations I (2)
-
RadFig-VQA: A Multi-Imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
Yosuke Yamagishi, Shouhei Hanaoka, Yuta Nakamura, Tomohiro Kikuchi, Akinobu Shimizu, Takeharu Yoshikawa, Osamu Abe -
Trustworthy clinical thinking in MLLMs: Hierarchical Energy-based Reasoning for interpretable MEdical Scans (HERMES)
Florence Xini Doo, Huiwen Han, Bradley A. Maron, Heng Huang
-
RadFig-VQA: A Multi-Imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
- 11:00 - 11:30 | Invited Talk III by Dr Hoifung Poon Title: Multimodal Generative AI for Precision Health
- 11:30 - 12:00 | Oral Presentations II (2)
-
Beyond One Size Fits All: Customization of Radiology Report Generation Methods
Tom van Sonsbeek, Arnaud Arindra Adiyoso Setio, Jung-Oh Lee, Junwoo Cho, Junha Kim, Hyeonsoo Lee, Gunhee Nam, Laurent Dillard, Tae Soo Kim -
MedDual: A Practical Dual-Decoding Framework for Mitigating Hallucinations in Medical Vision-Language Models
Zhe Zhang, Daisong Gan, Zhaochi Wen, Dong Liang
-
Beyond One Size Fits All: Customization of Radiology Report Generation Methods
- 12:00 - 12:30 | 3 Minute Lightning Talks for Poster Boaster
- 12:30 - 13:30 | Lunch Break
- 13:30 - 14:00 | Invited Talk IV by Professor Hao Chen Title: Harnessing Large AI Models for Transforming Healthcare
- 14:00 - 15:30 | Panel Discussion: Human–AI Collaboration in Clinical Workflows
- 15:30 - 16:00 | Coffee Break
- 16:00 - 17:00 | Poster Session
- 17:00 - 17:30 | Invited Talk V by Dr Valentina Salvatelli Title: From Raw Data to Real Impact: Training, Evaluating, and Integrating a Multimodal Radiology Assistant at Medical Center Scale
- 17:30 - 17:40 | Closing Remarks
Papers
Accepted Papers
Oral Presentations:
-
RadFig-VQA: A Multi-Imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
Yosuke Yamagishi, Shouhei Hanaoka, Yuta Nakamura, Tomohiro Kikuchi, Akinobu Shimizu, Takeharu Yoshikawa, Osamu Abe -
Trustworthy clinical thinking in MLLMs: Hierarchical Energy-based Reasoning for interpretable MEdical Scans (HERMES)
Florence Xini Doo, Huiwen Han, Bradley A. Maron, Heng Huang -
Beyond One Size Fits All: Customization of Radiology Report Generation Methods
Tom van Sonsbeek, Arnaud Arindra Adiyoso Setio, Jung-Oh Lee, Junwoo Cho, Junha Kim, Hyeonsoo Lee, Gunhee Nam, Laurent Dillard, Tae Soo Kim -
MedDual: A Practical Dual-Decoding Framework for Mitigating Hallucinations in Medical Vision-Language Models
Zhe Zhang, Daisong Gan, Zhaochi Wen, Dong Liang - Note: Oral presentation will be 15 minutes including Q&A.
- Note: All oral presentation papers will also have poster boards during the poster session.
- Note: Your printed poster must be in PORTRAIT format and not exceed A0 size (33.1 in x 46.8 in or 841 mm x 1189 mm).
Poster Presentations:
-
MedM-VL: What Makes a Good Medical LVLM?
Yiming Shi, Shaoshuai Yang, Xun Zhu, Haoyu Wang, Xiangling Fu, Miao Li, Ji Wu -
KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations
Junyu Liu, Lawrence K.Q. Yan, Tianyang Wang, Qian Niu, Momoko Nagai-Tanima, Tomoki Aoyama -
Predictive Multimodal Modeling of Diagnoses and Treatments in EHR
Cindy Shih-Ting Huang, Ng Boon Liang Clarence, Marek Rei -
Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework
Zhaolong Wu, Luo Pu, Jason Pui-Yin Cheung, Teng Zhang -
Aligning Multimodal Large Language Models with Patient-Physician Dialogues for AI-Assisted Clinical Support
Junyong Lee, Jeihee Cho, Jiwon Ryu, Shiho Kim -
DuEL-Med: A Dual-path Enhanced Language Model for Clinically-Aware Radiology Report Generation
Jin Kim, Dan Ruan, Matthew Sherman Brown -
On the risk of misleading reports: diagnosing textual biases in Multmodal Clinical AI
David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante - Note: All other accepted papers will have poster boards during the poster session.
- Note: All poster presentation papers will have 3 minute lightning talks for poster boaster.
- Note: The lighning talk should be only 1 page without Q&A session.
- Note: Your printed poster must be in PORTRAIT format and not exceed A0 size (33.1 in x 46.8 in or 841 mm x 1189 mm).
Highlights
Invited Speakers

Marinka Zitnik
Harvard University
Yun Liu
Google Research
Hoifung Poon
Microsoft Health Futures
Hao Chen
Hong Kong University of Science and Technology
Valentina Salvatelli
Microsoft Health FuturesHighlights
Invited Panelists

Jeya Maria Jose
Microsoft Research
Pranav Rajpurkar
Harvard University
Yueming Jin
National University of Singapore
Yun Liu
Google Research
Luping Zhou
University of Sydney
Edward Choi
KAIST
Tanveer Syeda-Mahmood
IBM ResearchOrganization
Organizing Committee

Yunsoo Kim
University College London
Chaoyi Wu
Shanghai Jiao Tong University
Justin Xu
University of Oxford
Hyewon Jeong
Massachusetts Institute of Technology
Sophie Ostmeier
Stanford University
Michelle Li
Harvard University
Zhihong Chen
Stanford University
Xiaoqing Guo
Hong Kong Baptist University
Yuyin Zhou
University of California, Santa Cruz
Weidi Xie
Shanghai Jiao Tong University
Honghan Wu
University of Glasgow
Curtis Langlotz
Stanford UniversityProgram Committee
- Hyeryun Park, Seoul National University
- Xiao Zhou, Shanghai AI Laboratory
- Yusuf Abdulle, University College London
- Weike Zhao, Shanghai Jiaotong University
- Xiaoman Zhang, Harvard Medical School
- Haoning Wu, Shanghai Jiaotong University
- Jie Liu, City University of Hong Kong
- Wenting Chen, City University of Hong Kong
- Yao Zhang, Lenovo
- Qiushi Yang, City University of Hong Kong
- Clemence Mottez, Stanford University
- Pengcheng Qiu, Shanghai Jiaotong University
- Abul Hasan, University of Oxford
- Jinge Wu, University College London
- Quang N Nguyen, University College London
- Kevin Yuan, University of Oxford
- Jianan Chen, University College London
- Venkat Nilesh Dadi, SIMI Group
- Teya Bergamaschi, MIT
- Mansu Kim, GIST