Abstract:
Six machine learning algorithms are used to classify subsurface rocks based on fifteen well logging features from four geothermal wells in the Snake River Plain, Idaho, USA. Two experimental designs, single- and multiple-well tests, are developed to determine the most optimal model and hyperparameters. The single-well test randomly assigns the data in each well into 70% for training set, 10% for validation set, and 20% for test set. The multiple-well test combines data from three wells and splits the data in the fourth well into 70% for training set, 10% for validation set, and 20% for test set. Results show that Extreme gradient boosting model (XGB) gives the highest accuracies in single- and multiple-well tests at 91% and 87%, respectively. This is because XGB can avoid unnecessary features and missing values based on decision tree classifier. In addition, multiple-well test is more complex and generally gives lower prediction accuracy than those of single-well test due to the variety of features from different wells. Artificial neural network (ANN), one of the deep learning algorithms, consistently gives lower accuracy than that of XGB in both tests. This is because ANN cannot handle imbalanced dataset as well as XGB. Overall, igneous rocks can be accurately classified due to their abundance, which allows the models to effectively learn about their distinct characteristics. Sedimentary rocks are the minor classes and mostly contain overlapped well logging responses, which impose difficulty in lithological classification. The classification of sedimentary rocks can be further improved by increasing a number of data and incorporating other physical properties such as grain size.