About

Multimodal Large Language Models in Clinical Practice

Welcome to the 1st Workshop on MLLMs in Clinical Practice co-located with MICCAI 2025!

Recent advancements in medical multimodal large language models (MLLMs), such as MedGemini, have introduced a transformative era in clinical AI, enabling the integration of various data modalities like 2D/3D medical images, text, and DNA sequences for more comprehensive diagnostics and personalized care. While these models show promise, challenges such as data scarcity, privacy concerns, and the need for more comprehensive evaluation metrics beyond accuracy must be addressed to fully realize their potential. MLLMs also offer exciting opportunities for enhanced human-AI collaboration in clinical workflows, improving diagnostic accuracy and decision-making. To facilitate research in this emerging field, we propose a workshop to foster discussion and collaboration on MLLM development and address the challenges of leveraging these models in clinical practice. The workshop theme includes topics but not limited to dataset construction, safety, fairness, human-AI collaboration, and new evaluation metrics for clinical MLLMs.

Topics of interests include but not limited to:

  • Multimodal Large Language Models (MLLMs) for Healthcare
  • Large-scale Dataset Construction
  • Evaluation Metrics for MLLMs
  • Safety, fairness, and Risks in Deployment
  • Human-AI Collaboration in Diagnosis and Treatment

Workshop Schedule

Program

September 23, 2025 - South Korea (Daejeon) local time (GMT+9)

  • 08:50 - 09:00 | Opening Remarks
  • 09:00 - 09:30 | Invited Talk I by Professor Marinka Zitnik Title: Empowering Biomedical Discovery with “AI Scientists”
  • 09:30 - 10:00 | Invited Talk II by Dr Yun Liu Title: Machine Learning and Generative AI for Biomedical Applications
  • 10:00 - 10:30 | Coffee Break
  • 10:30 - 11:00 | Oral Presentations I (2)
    • RadFig-VQA: A Multi-Imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
      Yosuke Yamagishi, Shouhei Hanaoka, Yuta Nakamura, Tomohiro Kikuchi, Akinobu Shimizu, Takeharu Yoshikawa, Osamu Abe
    • Trustworthy clinical thinking in MLLMs: Hierarchical Energy-based Reasoning for interpretable MEdical Scans (HERMES)
      Florence Xini Doo, Huiwen Han, Bradley A. Maron, Heng Huang
  • 11:00 - 11:30 | Invited Talk III by Dr Hoifung Poon Title: Multimodal Generative AI for Precision Health
  • 11:30 - 12:00 | Oral Presentations II (2)
    • Beyond One Size Fits All: Customization of Radiology Report Generation Methods
      Tom van Sonsbeek, Arnaud Arindra Adiyoso Setio, Jung-Oh Lee, Junwoo Cho, Junha Kim, Hyeonsoo Lee, Gunhee Nam, Laurent Dillard, Tae Soo Kim
    • MedDual: A Practical Dual-Decoding Framework for Mitigating Hallucinations in Medical Vision-Language Models
      Zhe Zhang, Daisong Gan, Zhaochi Wen, Dong Liang
  • 12:00 - 12:30 | 3 Minute Lightning Talks for Poster Boaster
  • 12:30 - 13:30 | Lunch Break
  • 13:30 - 14:00 | Invited Talk IV by Professor Hao Chen Title: Harnessing Large AI Models for Transforming Healthcare
  • 14:00 - 15:30 | Panel Discussion: Human–AI Collaboration in Clinical Workflows
  • 15:30 - 16:00 | Coffee Break
  • 16:00 - 17:00 | Poster Session
  • 17:00 - 17:30 | Invited Talk V by Dr Valentina Salvatelli Title: From Raw Data to Real Impact: Training, Evaluating, and Integrating a Multimodal Radiology Assistant at Medical Center Scale
  • 17:30 - 17:40 | Closing Remarks

Papers

Accepted Papers

Oral Presentations:

  • RadFig-VQA: A Multi-Imaging-Modality Radiology Benchmark for Evaluating Vision-Language Models in Clinical Practice
    Yosuke Yamagishi, Shouhei Hanaoka, Yuta Nakamura, Tomohiro Kikuchi, Akinobu Shimizu, Takeharu Yoshikawa, Osamu Abe
  • Trustworthy clinical thinking in MLLMs: Hierarchical Energy-based Reasoning for interpretable MEdical Scans (HERMES)
    Florence Xini Doo, Huiwen Han, Bradley A. Maron, Heng Huang
  • Beyond One Size Fits All: Customization of Radiology Report Generation Methods
    Tom van Sonsbeek, Arnaud Arindra Adiyoso Setio, Jung-Oh Lee, Junwoo Cho, Junha Kim, Hyeonsoo Lee, Gunhee Nam, Laurent Dillard, Tae Soo Kim
  • MedDual: A Practical Dual-Decoding Framework for Mitigating Hallucinations in Medical Vision-Language Models
    Zhe Zhang, Daisong Gan, Zhaochi Wen, Dong Liang
  • Note: Oral presentation will be 15 minutes including Q&A.
  • Note: All oral presentation papers will also have poster boards during the poster session.
  • Note: Your printed poster must be in PORTRAIT format and not exceed A0 size (33.1 in x 46.8 in or 841 mm x 1189 mm).

Poster Presentations:

  • MedM-VL: What Makes a Good Medical LVLM?
    Yiming Shi, Shaoshuai Yang, Xun Zhu, Haoyu Wang, Xiangling Fu, Miao Li, Ji Wu
  • KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations
    Junyu Liu, Lawrence K.Q. Yan, Tianyang Wang, Qian Niu, Momoko Nagai-Tanima, Tomoki Aoyama
  • Predictive Multimodal Modeling of Diagnoses and Treatments in EHR
    Cindy Shih-Ting Huang, Ng Boon Liang Clarence, Marek Rei
  • Adapting and Evaluating Multimodal Large Language Models for Adolescent Idiopathic Scoliosis Self-Management: A Divide and Conquer Framework
    Zhaolong Wu, Luo Pu, Jason Pui-Yin Cheung, Teng Zhang
  • Aligning Multimodal Large Language Models with Patient-Physician Dialogues for AI-Assisted Clinical Support
    Junyong Lee, Jeihee Cho, Jiwon Ryu, Shiho Kim
  • DuEL-Med: A Dual-path Enhanced Language Model for Clinically-Aware Radiology Report Generation
    Jin Kim, Dan Ruan, Matthew Sherman Brown
  • On the risk of misleading reports: diagnosing textual biases in Multmodal Clinical AI
    David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante
  • Note: All other accepted papers will have poster boards during the poster session.
  • Note: All poster presentation papers will have 3 minute lightning talks for poster boaster.
  • Note: The lighning talk should be only 1 page without Q&A session.
  • Note: Your printed poster must be in PORTRAIT format and not exceed A0 size (33.1 in x 46.8 in or 841 mm x 1189 mm).

Highlights

Invited Speakers

Marinka Zitnik

Harvard University

Yun Liu

Google Research

Hoifung Poon

Microsoft Health Futures

Hao Chen

Hong Kong University of Science and Technology

Valentina Salvatelli

Microsoft Health Futures

Highlights

Invited Panelists

Jeya Maria Jose

Microsoft Research

Pranav Rajpurkar

Harvard University

Yueming Jin

National University of Singapore

Yun Liu

Google Research

Luping Zhou

University of Sydney

Edward Choi

KAIST

Tanveer Syeda-Mahmood

IBM Research

Organization

Organizing Committee

Yunsoo Kim

University College London

Chaoyi Wu

Shanghai Jiao Tong University

Justin Xu

University of Oxford

Hyewon Jeong

Massachusetts Institute of Technology

Sophie Ostmeier

Stanford University

Michelle Li

Harvard University

Zhihong Chen

Stanford University

Xiaoqing Guo

Hong Kong Baptist University

Yuyin Zhou

University of California, Santa Cruz

Weidi Xie

Shanghai Jiao Tong University

Honghan Wu

University of Glasgow

Curtis Langlotz

Stanford University

Program Committee

  • Hyeryun Park, Seoul National University
  • Xiao Zhou, Shanghai AI Laboratory
  • Yusuf Abdulle, University College London
  • Weike Zhao, Shanghai Jiaotong University
  • Xiaoman Zhang, Harvard Medical School
  • Haoning Wu, Shanghai Jiaotong University
  • Jie Liu, City University of Hong Kong
  • Wenting Chen, City University of Hong Kong
  • Yao Zhang, Lenovo
  • Qiushi Yang, City University of Hong Kong
  • Clemence Mottez, Stanford University
  • Pengcheng Qiu, Shanghai Jiaotong University
  • Abul Hasan, University of Oxford
  • Jinge Wu, University College London
  • Quang N Nguyen, University College London
  • Kevin Yuan, University of Oxford
  • Jianan Chen, University College London
  • Venkat Nilesh Dadi, SIMI Group
  • Teya Bergamaschi, MIT
  • Mansu Kim, GIST
-->

Contact us

Email the organziers at: clinicalmllms [at] gmail [dot] com