饮茶对胃肠道疾病风险的双重作用:基于可解释机器学习与大语言模型的联合预测辅助模型

打开文本图片集
Abstract:ObjectiveToexplorethecorrelationofteaconsumptionwithrisksof gastrointestinal diseasesusingarisk prediction model integrating interpretable machine learning and alarge language model. Methods A survey was conducted amongthe patients undergoing both gastroscopy and 13C-urea breath testing at Gastrointestinal Endoscopy Center of Anxi HospitalofTraditional ChineseMedicine.Univariateanalysis wasperformedtodeterminethesuitabilityoffeatureselection. Thecolected data wererandomly divided into training and testing sets ina 7:3ratio.Support Vector Machine (SVM), KNearestNeighbors(K),ogisticegression (R),domFrest (RF),EtremeGdientoosting(GB),ndal Network (DNN)were aplied toidentifythe bestclasifier forpredicting high-risk gastrointestinalconditions.Bayesian optimizationalgorithm wasused toobtain theoptimalhyperparametercombinationsforthe6models.AfterModelfiting,the interpretabilityof the best models wasanalyzed using SHapley AdditiveexPlanations (SHAP).The DeepSeek-R1 base languagemodel wasfine-tunedwith gastrointestinal diseasedatasetand Chinese medicalonlineconsultation datatoobtain thefinalmodel.Results Thestudyincluded503 participants.Alltheselectedfeaturesshowedassciationwithgastrointestinal diseases, but only age exhibited a significant linear correlation ( (β=0.023, SE=0.008, t=2.942 1 P=0.003 ).DNNmodel performed the bestwithagocacy(68),ei(.68)ae(.85),e,U(.74).t were age,DOBvalue,andsmoking history.Thelargelanguage modelconstructed providedrecommendationsconsistent with thoseof professional physicians based on gastroscopyresults. Conclusion DNN model is efectiveforpredicting gastrointestinal diseaseriskandoffers reliablesupportforclinicalriskassessmentanddecision-makingregarding endoscopy Smoking cessation,moderate alcohol consumption,andreasonable teaintakemay helpprevent gastrointestinaldiseases. Keywords: gastrointestinal diseases; helicobacter pylori; machine learning; risk prediction; large language model
幽门螺杆菌 (H.p) 是一种定植于胃黏膜的微需氧螺旋杆菌,被世界卫生组织列为I类致癌原。(剩余13714字)