Show simple item record

dc.contributor.authorRahman, Sayed Rakinur
dc.contributor.authorSuraiya, Amanta
dc.contributor.authorRafsan, Md. Turan
dc.date.accessioned2026-05-12T10:45:22Z
dc.date.available2026-05-12T10:45:22Z
dc.date.issued2026-04
dc.identifier.urihttps://ar.iub.edu.bd/handle/11348/1198
dc.description.abstractAnomaly detection in offshore oil production systems is a safety-critical task, with un-detected faults contributing directly to equipment failures, environmental incidents, and significant production losses. Machine Learning (ML) has shown significant potential for fault classification in industrial time-series data. However, few comparative studies sys-thematically evaluate models across multiple learning paradigms, including classical, super-vised deep learning, and unsupervised learning on real-world operational data. Existing studies tend to evaluate individual model architectures, offering limited analysis across fault types, class-imbalance conditions, and temporal data characteristics. Multi-class fault detection in oil wells presents multiple challenges, such as severe class imbalance, overlapping faults, and the risk of temporal data leakage as a result of splitting strategies applied to sensor time series. This report presents an extensive comparison study of machine learning approaches for multi-class anomaly detection using the Petrobras 3W dataset across eight fault types, a publicly available benchmark containing real-world sensor data from offshore oil wells. Five distinct models are evaluated: Random Forest (RF) as a classical machine-learning baseline, while Long Short-Term Memory (LSTM) and Transformer classifiers as supervised deep-learning approaches. An LSTM Autoen- coder for unsupervised anomaly detection. This study proposes an Advanced Stacking Ensemble that combines all three supervised paradigms using cross-validated out-of-fold (OOF) prediction generation, meta-feature engineering including prediction entropy, inter- model agreement, confidence statistics and an eXtreme Gradient Boosting (XGBoost) meta-learner. A thorough preprocessing pipeline is implemented, including file-level data splitting to prevent temporal leakage, sliding-window segmentation, statistical feature engineering, and Synthetic Minority Over-sampling Technique (SMOTE) for class balancing. On the held-out test set, the Stacking Ensemble achieves the highest accuracy of 96.25%, while the RF has an accuracy of 95.98%. The RF also attains weighted F1-score of 0.9558. The Transformer (F1: 0.9295) and LSTM (F1: 0.9237) demonstrate competitive performance, with the Transformer offering the advantage of temporal modeling. The LSTM Autoencoder achieves an F1-score of 0.8588 for binary anomaly detection without requiring labeled anomaly data. Model interpretability is addressed through Shap-ley AdditiveexPlanations (SHAP) based feature-level importance analysis and Transformer attention-weight visualization for revealing fault-specific temporal interpretability attended to by the model.en_US
dc.language.isoenen_US
dc.publisherIUBen_US
dc.subjectAnomaly Detectionen_US
dc.subjectOffshore Oil Production Systemsen_US
dc.subjectMachine Learningen_US
dc.subjectPredictive Maintenanceen_US
dc.subjectArtificial Intelligence in Energy Systemsen_US
dc.subjectSHAP Interpretabilityen_US
dc.titleAnomaly Detection in Petroleum Production Wells Using Machine Learning: A Comparative Study on the Petrobras 3W Dataseten_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


Copyright © 2002-2021  IUB Academic Repository.
Maintained by  Library Information Technology (LIT)
LIT