Abstract
Phishing detection is a criticalcomponent of modern Security Information and Event Management (SIEM) systems, requiring both high accuracy and real-time performance. This study conducts a comprehensive comparison between a Gradient-Boosted Decision Tree model, XGBoost, and a deep learning architecture, TabNet, for classifying phishing URLs. Both models were systematicallyoptimized using advanced hyperparameter tuning techniques, Randomized Search for XGBoost and Optuna with pruning for TabNetto ensure a fair and robust evaluation. The models were trained and tested on the "Dataset of Suspicious Phishing URL Detection," a recent and relevant collection of URL features. The resultsdemonstrate that the tunedXGBoost model significantly outperforms the tunedTabNet model across all key metrics. Furthermore, inference speed analysis revealedXGBoostto besubstantially moreefficient on both CPU and GPU hardware, with a GPU inference time over 33 times faster thanTabNet. These findings lead to the conclusion that for this task,XGBoostoffers a superior combination of accuracy, speed, and practicaldeployability,making it the more suitable architecture for integration into a SIEM system.
Keywords
Phishing Detection; XGBoost; TabNet; SIEM; Inference Speed