基于大规模视觉语言模型的黑色素瘤诊断方法

赵家悦; 李诗曼; 章琛曦

doi:10.16098/j.issn.0529-1356.2025.01.003

PDF(4949 KB)

解剖学报 ›› 2025, Vol. 56 ›› Issue (1) : 22-29. DOI: 10.16098/j.issn.0529-1356.2025.01.003

肿瘤学专栏

基于大规模视觉语言模型的黑色素瘤诊断方法

赵家悦^1,2李诗曼^1,2章琛曦^1,2*

作者信息 +

A melanoma diagnosis method based on large-scale vision-language models

ZHAO Jia-yue^1,2LI Shi-man^1,2ZHANG Chen-xi^1,2*

Author information +

文章历史 +

摘要

目的开发一个基于大规模视觉语言模型的黑色素瘤诊断框架，并探讨该框架用于黑色素瘤诊断的可行性和准确性。方法采用公开数据集Derm7pt，其数据集划分为训练集 (346例)，验证集 (161例) 和测试集 (320例)。提出了一个基于大规模视觉语言模型的黑色素瘤诊断框架，该诊断框架包括两个文本分支和一个视觉分支。在文本分支中，一个分支处理固定的临床提示；另一个分支则处理可学习的提示。这种设计旨在通过固定的临床提示引导和优化可学习提示的效果。视觉分支处理皮肤镜图像，通过微调图像编码来增强对黑色素瘤特征的识别能力。结果在Derm7pt数据集上，我们的方法在性能上优于现有其他方法。其接收者操作特征曲线下面积（AUC），准确率和F1-分数分别为87.35%，84.17%和84.01%。结论通过适当的微调策略，基于大规模视觉语言预训练模型的方法能够有效地适应黑色素瘤的诊断任务。这种方法可以作为医生的有力辅助工具，帮助他们做出更加准确的诊断决策。

Abstract

Objective To develop a melanoma diagnosis framework based on large-scale vision-language models, and to explore the feasibility and accuracy of the framework for melanoma diagnosis. Methods The publicly available Derm7pt dataset, which was divided into a training set (346 cases), a validation set (161 cases), and a test set (320 cases) was utilized. A melanoma diagnosis framework based on large-scale vision-language models was proposed, comprising two text branches and one visual branch. In the text branches, one branch processed fixed clinical prompts, while the other handled learnable prompts. This design aimed to optimize the effectiveness of learnable prompts through guidance from fixed clinical prompts. The visual branch processed dermoscopic images and enhanced melanoma feature recognition through fine-tuning the image encoder. Results On the Derm7pt dataset, our method outperformd other existing method. It achieved an area under the receiver operating characteristic curve (AUC) of 87.35%, an accuracy of 84.17%, and an F1-score of 84.01%. Conclusion The study demonstrates that with appropriate fine-tuning strategies, methods based on large-scale vision-language pre-trained models can effectively adapt to melanoma diagnosis tasks. This approach can serve as a powerful auxiliary tool for doctors, helping them make more accurate diagnostic decisions.

导出引用

赵家悦李诗曼章琛曦. 基于大规模视觉语言模型的黑色素瘤诊断方法[J]. 解剖学报. 2025, 56(1): 22-29 https://doi.org/10.16098/j.issn.0529-1356.2025.01.003

ZHAO Jia-yue LI Shi-man ZHANG Chen-xi. A melanoma diagnosis method based on large-scale vision-language models[J]. Acta Anatomica Sinica. 2025, 56(1): 22-29 https://doi.org/10.16098/j.issn.0529-1356.2025.01.003

中图分类号： TP391

参考文献

［1］Arnold M, Singh D, Laversanne M, et al. Global burden of cutaneous melanoma in 2020 and projections to 2040［J］. JAMA Dermatol, 2022,158(5):495-503.

［2］Naik PP. Cutaneous malignant melanoma: a review of early diagnosis and management［J］. World J Oncol, 2021,12(1):7-19.

［3］Long GV, Swetter SM, Menzies AM, et al. Cutaneous melanoma［J］. Lancet, 2023, 402(10400): 485-502.

［4］He XC, Jin L, Li M, et al. Complete Box Fusion based on Ensemble Networks for rib fracture detection and localization［J］. Acta Anatomica Sinica, 2022, 53(3): 396-401. (in Chinese)

何学才, 金倞, 李铭, 等. 基于完全融合集成网络候选框的肋骨骨折检测方法［J］. 解剖学报, 2022, 53(3): 396-401.

［5］Deda LC, Goldberg RH, Jamerson TA, et al. Dermoscopy practice guidelines for use in telemedicine［J］. NPJ Digit Med, 2022,5(1):55.

［6］Thomas L, Puig S. Dermoscopy, digital dermoscopy and other diagnostic tools in the early detection of melanoma and follow-up of high-risk skin cancer patients［J］. Acta Derm Venereol, 2017, 97:14-21.

［7］Chatterjee S, Dey S, Munshi S. Integration of morphological preprocessing and fractal based feature extraction with recursive feature elimination for skin lesion types classification［J］. Comput Methods Programs Biomed,2019,178:201-218.

［8］Balasubramaniam V. Artificial intelligence algorithm with SVM classification using dermascopic images for melanoma diagnosis［J］. J Artif Intell Capsule Netw, 2021, 3(1): 34-42.

［9］Jojoa Acosta MF, Caballero Tovar LY, Garcia-Zapirain MB, et al. Melanoma diagnosis using deep learning techniques on dermatoscopic images［J］. BMC Med Imaging, 2021, 21(1): 6.

［10］Zhang J, Huang J, Jin S, et al. Vision-language models for vision tasks: A survey［J］. IEEE Trans Pattern Anal Mach Intell, 2024,46(8):5625-5644.

［11］Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision［C］. Proceedings of Machine Learning Research (PCLR), 2021, 139:8748-8763.

［12］Zhou K, Yang J, Loy CC, et al. Learning to prompt for vision-language models［J］. Int J Comput Vis, 2022,130:2337-2348.

［13］Yao H, Zhang R, Xu C. Visual-language prompt tuning with knowledge-guided context optimization［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023:6757-6767.

［14］Kawahara J, Daneshvar S, Argenziano G, et al. Seven-point checklist and skin lesion classification using multitask multimodal neural nets［J］. IEEE J Biomed Health Inform, 2019,23(2):538-546.

［15］Patrício C, Neves JC, Teixeira LF. Coherent concept-based explanations in medical image and its application to skin lesion diagnosis［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023: 3798-3807.

［16］Bie Y, Luo L, Chen H. MICA: towards explainable skin lesion diagnosis via multi-level image-concept alignment［C］. Proceedings of the AAAI Conference Artificial Intelligence (AAAI), 2024,38(2):837-845.

［17］Harrington E, Clyne B, Wesseling N, et al. Diagnosing malignant melanoma in ambulatory care: a systematic review of clinical prediction rules［J］. BMJ Open, 2017,7(3):e014096.

［18］Sarkar A, Vijaykeerthy D, Sarkar A, et al. A Framework for learning ante-hoc explainable models via concepts［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022:10276-10285.

［19］Patrício C, Neves JC, Teixeira LF. Coherent concept-based explanations in medical image and its application to skin lesion diagnosis［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023:3799-3808.

［20］Yuksekgonul M, Wang M, Zou J, et al. Post-hoc concept bottleneck models［C］. The Eleventh International Conference on Learning Representations (ICLR), 2023.

［21］Adegun A, Viriri S. Deep learning techniques for skin lesion analysis and melanoma cancer detection: a survey of state-of-the-art［J］. Artif Intell Rev, 2021, 54(2): 811-841.