基于图自动编码器和梯度决策树集成的lncRNA⁃疾病关联预测方法

打开文本图片集
中图分类号:TN911.23⁃34文献标识码: A
文章编号:1004⁃373X(2025)12⁃0061⁃06
Abstract:The abnormal expression of long non⁃coding RNA (lncRNA) is closely associated with the development of human diseases. Utilizing computational methods to predict the potential associations between lncRNAs and diseases can significantly reduce the costs of subsequent biological experiments verification. In allusion to the problem that existing machine learning methods are susceptible to noise interference and have insufficient prediction accuracy, a novel lncRNA ⁃ disease association prediction based on graph autoencoder and gradient ⁃ based decision tree ensemble (LDA ⁃ GADT) is designed. The Gaussian interaction profile kernel similarity between lncRNAs and diseases was calculated to supplement the functional similarity of lncRNAs and the semantic similarity of diseases, so as to obtain the integrated similarity matrices for lncRNAs and diseases. The graph autoencoder was used to learn the feature representations of lncRNA ⁃ disease pairs. The gradient ⁃ based decision tree ensemble algorithm was employed to predict the associations between lncRNAs and diseases. A 5 ⁃ fold cross ⁃ validation experimental results show that, on the lncRNA disease database, the AUC of the LDA⁃GADT model is 0.942 4, which is 8.46% , 6.5% , 1.28% and 3.14% higher than that of the LDNFSGB, SDLDA, RWSF ⁃ BLP and LDAenDL models, respectively. On the MNDR database, the AUC of the LDA ⁃ GADT model is 0.982 2, which is 4.76% , 2.62%, 1.93%, and 1.14% higher than the above comparison model, respectively. The accuracy and effectiveness of the proposed model are further verified by the case analysis of lung cancer and breast cancer.
Keywords:lncRNA ⁃ disease association; association prediction; Gaussian association kernel similarity; graph autoencoder; gradient descent; decision tree; feature extraction
0 引 言
长 链 非 编 码 RNA(long non⁃coding RNA, lncRNA)是一种核苷酸长度超过200 的非编码RNA。(剩余9931字)