基于SMOTE平衡数据的极端随机树岩性识别

打开文本图片集
中图分类号:P631.8 文献标志码:A
Abstract: In the domains of oil and gas exploration and geoengineering, precise lithology identification holds paramount importance for the assessment and utilization of resources. The inherent complexity of geologic data and the imbalanced distribution of lithology samples pose significant challenges to traditional methods in terms of lithology identification. In this paper,we propose a methodology for lithology identification that combines SMOTE(synthetic minority over-sampling technique) with extra trees.Firstly,the SMOTE method is employed to enhance the representation of minority class samples,thereby improving the balance of the training data. Secondly,the lithology classification model is constructed using the high efficiency and strong generalization ability of extra trees. The experimental findings demonstrate that the recognition accuracy of extra trees is 85.54% , which is 5.58% , 2.55% , 2.35% ,and 2.08% higher than that of other machine learning methodsgradient boosting decision tree (GBDT),extreme gradient boosting (XGBoost),light gradient boosting machine(LightGBM),and random forest method,respectively. The prediction bias of the model caused by sample imbalance is mitigated by SMOTE sampling,resulting in enhanced recognition accuracy for specific lithology categories within each model. Consequently,this leads to an overall enhancement in the performance of the model. The extra trees model exhibits the best performance,achieving an identification accuracy of 86.62% ,which represents improvements of 4.71% , 2.56% , 1.55% ,and 2.02% over GBDT,XGBoost,LightGBM,and random forest,respectively. These results confirm the effectiveness of combining SMOTE with extra trees for lithology identification.
Key words:lithology identification; machine learning; random forest; extra trees; data balancing
0 引言
岩性识别是勘探和储层评价等地质工作的基础,在油气勘探和开发中扮演着至关重要的角色[1-2],它不仅提升了钻探效率,也对优化油气田的开发策略起到了核心作用。(剩余16957字)