基于SMOTE算法与可解释性机器学习的鼻咽癌远处转移预测模型研究

  • 打印
  • 收藏
收藏成功


打开文本图片集

中图分类号:R739.63 文献标志码:A DOI:10.3969/j. issn.1003-1383.2025.10.007

【Abstract】ObjectiveTo develop anasopharyngealcarcinoma(NPC)distant metastasis predictionmodel based on SMOTE algorithmand interpretable machine learning,evaluate theprediction performanceof diferent algorithms,and provideclinicalauxiliary tols forearlypredictionof NPC distant metastasis through SHAP interpretationmethodand PR curve analysis model decisionbasis.MethodsClinicalandlaboratory examinationdata of145NPC patients admited to the Second AffliatedHospitalof GuangxiUniversityofScienceandTechnologyfromJanuary2O2OtoDecember2023were colleted.Diferences indata wereanalyzed,andindicators withsignificantdiferencesbetweenNPCdistant metastasis group andnonmetastasis group were selected asfeaturevariables.Four machine learning algorithms(XGBoost,random forest, LightGBM,and logisticregresion)were usedtoconstructanNPCdistant metastasisprediction model.Basedonfive commonlyusedevaluation indicators:accuracy,precision,recall,F1score,andareaunder thereceiveroperatingcharacteristic(ROC)curve( AUC )value,the optimal prediction model was selected by combining the ROC curve and PR curve. Theoptimal model wassubjected to SHAPinterpretabilityanalysis toexplorethekeyfeature variablesthatafected NPC distantmetastasis.ResultsWithoutaddressing theclassimbalance,theXGBost model exhibited thebestpredictive performance,with an AUC of 0.86,accuracy of 0.86,F1 score of O.72,sensitivity of O.62,and specificity of 0.97.After applying the SMOTEmethod for oversampling to balance theclasss,theperformanceof theXGBost model improved further,with an AUC of 0.96 ,accuracy of0.92,F1 score ofO.93,sensitivityof0.91,and specificityof0.93.ThePR curve analysisdemonstratedthattheXGBoostmodel showedgood stabilityatdifferentthresholds.Inthe SHAPmodel interpretation,squamouscellcarcinomaantigen,redbloodcelldistributionwidth,andcytokeratin19fragmentwereidentifiedas importantfeaturesthatsignificantlyafectedthepredictionofdistant metastasisinNPC.ConclusionThepredictionmodel basedonSMOTEandXGBoostcaneffectivelydetecttheriskofdistantmetastasis inNPC.Anditprovidesbiologicallsignificantdecision-makingbasis forclinicalpracticecombinedwithSHAPinterpretabilityanalysis.This modelholdsthepotential toassist physicians intheearlyscrening of high-risk patients and in formulating personalized treatment strategies.

【Keywords】nasopharyngeal carcinoma(NPC);machine learning;prediction model; SHAP value; SMOTE algorithm

鼻咽癌(nasopharyngealcarcinoma,NPC)是发生在鼻咽部的恶性肿瘤,在我国南方及东南亚地区发病率较高[1]。(剩余9987字)

monitor
客服机器人