Klasifikasi URL Phishing untuk SIEM: Perbandingan Model Machine Learning XGBoost dan Deep Learning TabNet dalam Deteksi Ancaman Siber

Azza Farichi Tjahjono; Hasan Hasan; Randist Prawandha Putera; Dionisius Marcell Putra Indranto; Abhirama Triadyatma Hermawan

doi:10.52620/sainsdata.v3i2.227

Klasifikasi URL Phishing untuk SIEM: Perbandingan Model Machine Learning XGBoost dan Deep Learning TabNet dalam Deteksi Ancaman Siber

DOI: 10.52620/sainsdata.v3i2.227

Author

Azza Farichi Tjahjono⁽¹⁾, Hasan Hasan⁽²⁾, Randist Prawandha Putera⁽³⁾, Dionisius Marcell Putra Indranto⁽⁴⁾, Abhirama Triadyatma Hermawan⁽⁵⁾,
⁽¹⁾ Institut Teknologi Sepuluh Nopember, Indonesia
⁽²⁾ Institut Teknologi Sepuluh Nopember, Indonesia
⁽³⁾ Institut Teknologi Sepuluh Nopember, Indonesia
⁽⁴⁾ Institut Teknologi Sepuluh Nopember, Indonesia
⁽⁵⁾ Institut Teknologi Sepuluh Nopember, Indonesia

Corresponding Author

Article Analytic

[File Size: 515KB] Language: Id
Available online: 2025-07-10 | Published : 2025-07-10
Copyright (c) 2025 Hasan Hasan
Article can trace at:

Article Metrics

Abstract Views: 1460 times PDF Downloaded: 818 times

Abstract

Phishing detection is a criticalcomponent of modern Security Information and Event Management (SIEM) systems, requiring both high accuracy and real-time performance. This study conducts a comprehensive comparison between a Gradient-Boosted Decision Tree model, XGBoost, and a deep learning architecture, TabNet, for classifying phishing URLs. Both models were systematicallyoptimized using advanced hyperparameter tuning techniques, Randomized Search for XGBoost and Optuna with pruning for TabNetto ensure a fair and robust evaluation. The models were trained and tested on the "Dataset of Suspicious Phishing URL Detection," a recent and relevant collection of URL features. The resultsdemonstrate that the tunedXGBoost model significantly outperforms the tunedTabNet model across all key metrics. Furthermore, inference speed analysis revealedXGBoostto besubstantially moreefficient on both CPU and GPU hardware, with a GPU inference time over 33 times faster thanTabNet. These findings lead to the conclusion that for this task,XGBoostoffers a superior combination of accuracy, speed, and practicaldeployability,making it the more suitable architecture for integration into a SIEM system.

Keywords

Phishing Detection; XGBoost; TabNet; SIEM; Inference Speed

References

Ahammad, S. H., Kale, S. D., Upadhye, G. D., Pande, S. D., Babu, E. V., Dhumane, A. V., & Bahadur, M. D.K. J. (2022). Phishing URL detection using machine learning methods. Advances in Engineering Software, 173, 103288. https://doi.org/10.1016/j.advengsoft.2022.103288

Anggoro, D. A., & Mukti, S. S. (2021). Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure. International Journal of Intelligent Engineering & Systems, 14(6). https://doi.org/10.22266/ijies2021.1231.19

Anitha, J., & Kalaiarasu, M. (2022). A new hybrid deep learning-based phishing detection system using MCS-DNN classifier. Neural Computing and Applications, 34(8), 5867-5882. https://doi.org/10.1007/s00521-021-06717-w

Arik, S. Ö., & Pfister, T. (2021, May). Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 8, pp. 6679-6687). https://doi.org/10.1609/aaai.v35i8.16826

Ashfaq, S., Chandre, P., Pathan, S., Mande, U., Nimbalkar, M., & Mahalle, P. (2023). Defending against vishing attacks: A comprehensive review for prevention and mitigation techniques. In International Conference on Recent Developments in Cyber Security (pp. 411-422). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-9811-1_33

Basit, A., Zafar, M., Liu, X., Javed, A. R., Jalil, Z., & Kifayat, K. (2021). A comprehensive survey of AI enabled phishing attacks detection techniques. Telecommunication Systems, 76, 139-154. https://doi.org/10.1007/s11235-020-00733-2

Bountakas, P., & Xenakis, C. (2023). Helphed: Hybrid ensemble learning phishing email detection. Journal of Network and Computer Applications, 210, 103545. https://doi.org/10.1016/j.jnca.2022.103545

Catal, C., Giray, G., Tekinerdogan, B., Kumar, S., & Shukla, S. (2022). Applications of deep learning for phishing detection: a systematic literature review. Knowledge and Information Systems, 4(6), 1457-1500. https://doi.org/10.1007/s10115-022-01672-x

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp 885-794). https://doi.org/10.1145/2939672.2939785

Chiew, K. L., Tan, C. L., Wong, K., Yong, K. S., & Tiong, W. K. (2019). A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences, 484, 153-166. https://doi.org/10.1016/j.ins.2019.01.064

de Rancourt-Raymond, A., & Smaili, N. (2023). The unethical use of deepfakes. Journal of Financial Crime, 30(4), 1066-1077. https://doi.org/10.1108/JFC-04-2022-0090

Dewis, M., & Viana, T. (2022). Phish responder: A hybrid machine learning approach to detect phishing and spam emails. Applied System Innovation, 5(4), 73. https://doi.org/10.3390/asi5040073

Do, N. Q., Selamat, A., Krejcar, O., Herrera-Viedma, E., & Fujita, H. (2022). Deep learning for phishing detection: Taxonomy, current challenges and future directions. Ieee Access, 10, 36429-36463. https://doi.org/10.1109/ACCESS.2022.3151903

El-Metwaly, A. E. S., Bedair, M. R., Abdallah, S. T., Mahmoud, A. M., Mohamed, M. E., Mahmoud, M. E., & Takieldeen, A. E. (2024, July). Detection of Phishing URLs Based on Machine Learning and Cybersecurity. In 2024 International Telecommunications Conference (ITC-Egypt) (pp. 394-398). IEEE. https://doi.org/10.1109/ITC-Egypt61547.2024.10620574

Hanifi, S., Cammarono, A., & Zare-Behtash, H. (2024). Advanced hyperparameter optimization of deep learning models for wind power prediction. Renewable Energy, 221, 119700. https://doi.org/10.1016/j.renene.2023.119700

Jain, A. K., & Gupta, B. B. (2022). A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterprise Information Systems, 16(4), 527-565. https://doi.org/10.1080/17517575.2021.1896786

Karim, A., Shahroz, M., Mustofa, K., Belhaouari, S. B., & Joga, S. R. K. (2023). Phishing detection system through hybrid machine learning based on URL. IEEE Access, 11, 36805-36822. https://doi.org/10.1109/ACCESS.2023.3252366

Lin, Y., Liu, R., Divakaran, D. M., Ng, J. Y., Chan, Q. Z., Lu, Y., ... & Dong, J. S. (2021). Phishpedia: A hybrid deep learning based approach to visually identify phishing webpages. In 30th USENIX Security Symposium (USENIX Security 21) (pp. 3793-3810).

López, V., Fernández, A., & Herrera, F. (2014). On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed. Information Sciences, 257, 1-13. https://doi.org/10.1016/j.ins.2013.09.038

Maulani, G., Hasan, F. N., Setiawan, D., Bowo, I. T., Ardhana, V. Y. P., Ramdhani, Y., Safitri, R. (2025). Machine Learning. MEGA PRESS NUSANTARA.

NaliniPriya, G., Damoddaram, K., Gopi, G., & Nitish Kumar, R. (2023, February). Phishing attack detection using machine learning. In International Conference on Emerging Trends in Expert Applications & Security (pp. 301-312). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-1946-8_27

Naqvi, B., Perova, K., Farooq, A., Makhdoom, I., Oyedeji, S., & Porras, J. (2023). Mitigation strategies against the phishing attacks: A systematic literature review. Computers & Security, 132, 103387. https://doi.org/10.1016/j.cose.2023.103387

Naseer, M., Ullah, F., Saeed, S., Algarni, F., & Zhao, Y. (2025). Explainable TabNet ensemble model for identification of obfuscated URLs with features selection to ensure secure web Browse. Scientific Reports, 15(1), 9496. https://doi.org/10.1038/s41598-025-93286-w

Nayak, G. S., Muniyal, B., & Belavagi, M. C. (2025). Enhancing Phishing Detection: A Machine Learning Approach With Feature Selection and Deep Learning Models. IEEE Access. https://doi.org/10.1109/ACCESS.2025.3543738

Norton, M., & Uryasev, S. (2019). Maximization of auc and buffered auc in binary classification. Mathematical Programming, 174, 575-612. https://doi.org/10.1007/s10107-018-1312-2

Park, J. Y., & Kim, T. S. (2025). An Automated Scenario Generation Model for Anti-phishing using Generative AI. In 2025 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 368-370). IEEE. https://doi.org/10.1109/BigComp64353.2025.00073

Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357. https://doi.org/10.1016/j.eswa.2018.09.029

Schmitt, M., & Flechais, I. (2024). Digital deception: Generative artificial intelligence in social engineering and phishing. Artificial Intelligence Review, 57(12), 1-23. https://doi.org/10.1007/s10462-024-10973-2

Yang, R., Zheng, K., Wu, B., Wu, C., & Wang, X. (2021). Phishing website detection based on deep convolutional neural network and random forest ensemble learning. Sensors, 21(24), 8281. https://doi.org/10.3390/s21248281

Zhu, E., Chen, Z., Cui, J., & Zhong, H. (2022). MOE/RF: a novel phishing detection model based on revised multiobjective evolution optimization algorithm and random forest. IEEE Transactions on Network and Service Management, 19(4), 4461-4478. https://doi.org/10.1109/TNSM.2022.3162885

Refbacks

There are currently no refbacks.

Author

Article Metrics

Abstract

Keywords

References

Refbacks

Policies

Submissions

Other

Share

Username
Password
Remember me