Anomaly Detection in Petroleum Production Wells Using Machine Learning: A Comparative Study on  the Petrobras 3W Dataset

Rahman, Sayed Rakinur; Suraiya, Amanta; Rafsan, Md. Turan

dc.contributor.author	Rahman, Sayed Rakinur
dc.contributor.author	Suraiya, Amanta
dc.contributor.author	Rafsan, Md. Turan
dc.date.accessioned	2026-05-12T10:45:22Z
dc.date.available	2026-05-12T10:45:22Z
dc.date.issued	2026-04
dc.identifier.uri	https://ar.iub.edu.bd/handle/11348/1198
dc.description.abstract	Anomaly detection in offshore oil production systems is a safety-critical task, with un-detected faults contributing directly to equipment failures, environmental incidents, and significant production losses. Machine Learning (ML) has shown significant potential for fault classification in industrial time-series data. However, few comparative studies sys-thematically evaluate models across multiple learning paradigms, including classical, super-vised deep learning, and unsupervised learning on real-world operational data. Existing studies tend to evaluate individual model architectures, offering limited analysis across fault types, class-imbalance conditions, and temporal data characteristics. Multi-class fault detection in oil wells presents multiple challenges, such as severe class imbalance, overlapping faults, and the risk of temporal data leakage as a result of splitting strategies applied to sensor time series. This report presents an extensive comparison study of machine learning approaches for multi-class anomaly detection using the Petrobras 3W dataset across eight fault types, a publicly available benchmark containing real-world sensor data from offshore oil wells. Five distinct models are evaluated: Random Forest (RF) as a classical machine-learning baseline, while Long Short-Term Memory (LSTM) and Transformer classifiers as supervised deep-learning approaches. An LSTM Autoen- coder for unsupervised anomaly detection. This study proposes an Advanced Stacking Ensemble that combines all three supervised paradigms using cross-validated out-of-fold (OOF) prediction generation, meta-feature engineering including prediction entropy, inter- model agreement, confidence statistics and an eXtreme Gradient Boosting (XGBoost) meta-learner. A thorough preprocessing pipeline is implemented, including file-level data splitting to prevent temporal leakage, sliding-window segmentation, statistical feature engineering, and Synthetic Minority Over-sampling Technique (SMOTE) for class balancing. On the held-out test set, the Stacking Ensemble achieves the highest accuracy of 96.25%, while the RF has an accuracy of 95.98%. The RF also attains weighted F1-score of 0.9558. The Transformer (F1: 0.9295) and LSTM (F1: 0.9237) demonstrate competitive performance, with the Transformer offering the advantage of temporal modeling. The LSTM Autoencoder achieves an F1-score of 0.8588 for binary anomaly detection without requiring labeled anomaly data. Model interpretability is addressed through Shap-ley AdditiveexPlanations (SHAP) based feature-level importance analysis and Transformer attention-weight visualization for revealing fault-specific temporal interpretability attended to by the model.	en_US
dc.language.iso	en	en_US
dc.publisher	IUB	en_US
dc.subject	Anomaly Detection	en_US
dc.subject	Offshore Oil Production Systems	en_US
dc.subject	Machine Learning	en_US
dc.subject	Predictive Maintenance	en_US
dc.subject	Artificial Intelligence in Energy Systems	en_US
dc.subject	SHAP Interpretability	en_US
dc.title	Anomaly Detection in Petroleum Production Wells Using Machine Learning: A Comparative Study on the Petrobras 3W Dataset	en_US
dc.type	Thesis	en_US

Files in this item

Name:: SP_Thesis_Rakinur_Amanta_Turan ...
Size:: 4.342Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Undergraduate Thesis [44]
By CSE Department

Show simple item record