Anomaly Detection in Petroleum Production Wells Using Machine Learning: A Comparative Study on the Petrobras 3W Dataset
Date
2026-04Author
Rahman, Sayed Rakinur
Suraiya, Amanta
Rafsan, Md. Turan
Metadata
Show full item recordAbstract
Anomaly detection in offshore oil production systems is a safety-critical task, with un-detected faults contributing directly to equipment failures, environmental incidents, and significant production losses. Machine Learning (ML) has shown significant potential for fault classification in industrial time-series data. However, few comparative studies sys-thematically evaluate models across multiple learning paradigms, including classical, super-vised deep learning, and unsupervised learning on real-world operational data. Existing studies tend to evaluate individual model architectures, offering limited analysis across fault types, class-imbalance conditions, and temporal data characteristics. Multi-class fault detection in oil wells presents multiple challenges, such as severe class imbalance, overlapping faults, and the risk of temporal data leakage as a result of splitting strategies applied to sensor time series. This report presents an extensive comparison study of machine learning approaches for multi-class anomaly detection using the Petrobras 3W dataset across eight fault types, a publicly available benchmark containing real-world sensor data from offshore oil wells. Five distinct models are evaluated: Random Forest (RF) as a classical machine-learning baseline, while Long Short-Term Memory (LSTM) and Transformer classifiers as supervised deep-learning approaches. An LSTM Autoen- coder for unsupervised anomaly detection. This study proposes an Advanced Stacking Ensemble that combines all three supervised paradigms using cross-validated out-of-fold (OOF) prediction generation, meta-feature engineering including prediction entropy, inter- model agreement, confidence statistics and an eXtreme Gradient Boosting (XGBoost) meta-learner. A thorough preprocessing pipeline is implemented, including file-level data splitting to prevent temporal leakage, sliding-window segmentation, statistical feature engineering, and Synthetic Minority Over-sampling Technique (SMOTE) for class balancing. On the held-out test set, the Stacking Ensemble achieves the highest accuracy of 96.25%, while the RF has an accuracy of 95.98%. The RF also attains weighted F1-score of 0.9558. The Transformer (F1: 0.9295) and LSTM (F1: 0.9237) demonstrate competitive performance, with the Transformer offering the advantage of temporal modeling. The LSTM Autoencoder achieves an F1-score of 0.8588 for binary anomaly detection without requiring labeled anomaly data. Model interpretability is addressed through Shap-ley AdditiveexPlanations (SHAP) based feature-level importance analysis and Transformer attention-weight visualization for revealing fault-specific temporal interpretability attended to by the model.
Collections
- Undergraduate Thesis [44]
