Klasifikasi URL Phishing untuk SIEM: Perbandingan Model Machine Learning XGBoost dan Deep Learning TabNet dalam Deteksi Ancaman Siber

Azza Farichi Tjahjono; Hasan Hasan; Randist Prawandha Putera; Dionisius Marcell Putra Indranto; Abhirama Triadyatma Hermawan

Abstract

Phishing detection is a criticalcomponent of modern Security Information and Event Management (SIEM) systems, requiring both high accuracy and real-time performance. This study conducts a comprehensive comparison between a Gradient-Boosted Decision Tree model, XGBoost, and a deep learning architecture, TabNet, for classifying phishing URLs. Both models were systematicallyoptimized using advanced hyperparameter tuning techniques, Randomized Search for XGBoost and Optuna with pruning for TabNetto ensure a fair and robust evaluation. The models were trained and tested on the "Dataset of Suspicious Phishing URL Detection," a recent and relevant collection of URL features. The resultsdemonstrate that the tunedXGBoost model significantly outperforms the tunedTabNet model across all key metrics. Furthermore, inference speed analysis revealedXGBoostto besubstantially moreefficient on both CPU and GPU hardware, with a GPU inference time over 33 times faster thanTabNet. These findings lead to the conclusion that for this task,XGBoostoffers a superior combination of accuracy, speed, and practicaldeployability,making it the more suitable architecture for integration into a SIEM system.

Keywords

Phishing Detection; XGBoost; TabNet; SIEM; Inference Speed