Simulating Clinical AI Assistance using Multimodal LLMs: A Case Study in Diabetic Retinopathy

2025-09-17

Summary

The study investigates the use of multimodal large language models (MLLMs) for detecting diabetic retinopathy (DR) and explores their potential in simulating clinical AI assistance. Two models, GPT-4o and MedGemma, were tested on different datasets to evaluate their effectiveness individually and in collaboration. MedGemma demonstrated better baseline sensitivity and robustness to incorrect inputs, while GPT-4o excelled when using descriptive outputs from MedGemma, achieving high accuracy without direct image access.

Why This Matters

Diabetic retinopathy is a major cause of blindness, and AI systems can significantly enhance screening accessibility. However, current systems are limited by their minimal outputs, which can hinder clinician trust. This study highlights the potential of MLLMs to provide more descriptive and explainable outputs, which could improve clinician-AI collaboration and trust in medical settings, especially in low-resource environments.

How You Can Use This Info

Healthcare professionals can consider integrating open-source, lightweight models like MedGemma into screening processes, as they offer high sensitivity and can operate without internet access, making them ideal for low-resource settings. Additionally, enhancing AI outputs with descriptive information rather than just binary predictions could improve clinician trust and decision-making, suggesting a shift in how AI tools are developed and utilized in clinical workflows.

Read the full article