基于TPE-LightGBM的平顶山煤田矿井水源判识模型研究

Identification model of Pingdingshan coalfield mine water source based on TPE-LightGBM

  • 摘要: 快速识别突水水源类型是矿井水害防治的关键环节。为实现对平顶山煤田矿井水源的准确识别,分别提取地表水、第四系孔隙水、石炭系灰岩岩溶水、二叠系砂岩水和寒武系灰岩岩溶水的水样,并选择关键判别指标Na++K+、Ca2+、Mg2+、Cl-、SO42-、HCO-3进行分析。为避免模型因离群数据的干扰产生过拟合现象,利用箱型图准确地显示出数据的离散分布情况,并从数据中快速识别出20组异常值,对研究数据进行了清洗。将清洗后的数据以8∶2的比例划分为学习样本和测试样本,并将学习样本输入光梯度提升机(LightGBM)进行模型训练。利用树状结构帕森估计器(TPE)对LightGBM进行主要参数的优化,构建TPE-LightGBM模型。将LightGBM与TPE-LightGBM的结果对比可知,模型的精度提升了13.9%,表明TPE算法具有一定的有效性。为进一步验证模型的性能,将实验结果与随机搜索-多层感知机(RS-MLP)、遗传算法-极限梯度提升树(GA-XGBoost)模型进行比较。结果显示,TPE-LightGBM模型具有更高的精度和较低的泛化误差,这表明TPE-LightGBM在水源辨识中更具优势并且适用性较强。利用Gini系数对变量的贡献度进行量化,根据计算结果可知Ca2+的贡献度最高,因此需要注意Ca2+的浓度变化。综上所述,TPE-LightGBM具有较高的精确度和泛化能力,在水源识别问题上具有一定的指导意义。

     

    Abstract: Quickly identifying the type of water inrush source is a key part of mine water damage prevention and control.To realize the accurate identification of mine water sources from the Pingdingshan Coalfield, water samples from different aquifers, such as surface water, Quaternary pore water, Carboniferous tuff karst water, Permian sandstone water, and Cambrian tuff karst water, were extracted, respectively, and the key discriminatory indexes, Na++K+,Ca2+,Mg2+,Cl-,SO42-,and HCO-3,were selected for the analysis.To avoid model overfitting due to the interference of outlier data, the paper utilizes box plots to show the discrete distributions of the data accurately, and twenty sets of outliers are quickly identified from the data to clean the study data.The cleaned data is divided into learning and test samples in the ratio of 8∶2,and the learning samples are fed into the Light Gradient Boosting Machine(LightGBM)for model training.The tree-structured Parson estimator(TPE)is used to optimize the main parameters of LightGBM and construct the TPE-LightGBM model.Comparing the results of LightGBM with those of TPE-LightGBM,the model's accuracy is improved by 13.9%,which indicates that the TPE algorithm is effective.To further validate the performance of the model, the experimental results are compared with the Random Search-Multi-Layer Perceptron Machine(RS-MLP)and Genetic Algorithm-Extreme Gradient Boosting Tree(GA-XGBoost)models.The results show that the TPE-LightGBM model has higher accuracy and lower generalization error, which indicates that TPE-LightGBM is more advantageous and applicable in water source identification.The contribution of the variables was quantified using the Gini coefficients, and based on the calculations, it is clear that Ca2+ has the highest contribution, so it is necessary to pay attention to the changes in the concentration of Ca2+.

     

/

返回文章
返回