Objective The aim is to accurately predict the emission mass concentration of sulfur dioxide (SO2) in the flue gas of the tail gas treatment unit of natural gas purification plants.
Method The data set was constructed using 44 000 hourly tail gas treatment daily report data from a natural gas purification plant from 2018 to 2023. Data processing was conducted, and 27 important features were extracted using importance analysis methods. Aiming at the prediction task of SO2 emission mass concentration in flue gas, three ensemble learning algorithms—namely, Random Forest, Gradient Boost, and XGBoost—and a Support Vector Machine (SVM) based on a Radial Basis Function (RBF) kernel were used to model the process instead of simulation models.
Result The prediction accuracy of the three ensemble learning models was higher than the SVM single model. Among them, the Random Forest model exhibited the best performance, with a coefficient of determination of 0.89 and a mean square error of 1 250.59. Relative to a data set containing 8 800 real test set samples, its prediction deviation was 9.86%. Compared to the Random Forest model without data treatment, its coefficient of determination increased by 61.82%.
Conclusion The Random Forest model has practical production application value in accurately predicting SO2 emission mass concentration of the tail gas treatment unit and can provide reliable model support for the subsequent process parameter optimization of the tail gas treatment unit.