An Improved Method for Detecting Phishing Websites Using Data Mining on Web Pages
Subject Areas : Generalmahdiye baharloo 1 , Alireza Yari 2 *
1 -
2 -
Keywords: Phishing, Data Mining, Feature Selection and Feature Extraction,
Abstract :
Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is regarded as one of the important prerequisites in designing an accurate detection system. Therefore, in order to detect phishing features, a list of 30 features suggested by phishing websites was first prepared. Then, a two-stage feature reduction method based on feature selection and extraction were proposed to enhance the efficiency of phishing detection systems, which was able to reduce the number of features significantly. Finally, the performance of decision tree J48, random forest, naïve Bayes methods were evaluated{cke_protected_1}{cke_protected_2}{cke_protected_3}{cke_protected_4} on the reduced features. The results indicated that accuracy of the model created to determine the phishing websites by using the two-stage feature reduction based Wrapper and Principal Component Analysis (PCA) algorithm in the random forest method of 96.58%, which is a desirable outcome compared to other methods.
]1[ اسماعیلی، مهدی، مفاهیم و تکنیکهای دادهکاوی، کاشان: سوره، 1392. http://www. p30download.com/fa/entry/53064
]2[ حاتمی خواه، نفیسه، "بررسی روشهای مبتنی بر انتخاب ویژگی"، تهران، دانشگاه صنعتی مالک اشتر، 1392. http://ceit.aut.ac.ir. ]دسترسی در 21/3/1396[.
]3[ سعیدی، پریسا، "بررسی سیستمهای هوشمند تشخیص وبسایت فیشینگ در بانکداری الکترونیکی به روش منطق فازی"، نخستین کنفرانس بینالمللی فناوری اطلاعات، تهران: مرکز همایشهای توسعه ایران، 1394. https://www.civilica.com/Paper-FBFI01-FBFI01_144.html
]4[ لنگری، نفیسه، عبدالرزاق نژاد، مجید، "شناسایی وبگاه فیشینگ در بانکداری اینترنتی با استفاده از الگوریتم بهینهسازی صفحات شیبدار"، مجله پدافند الکترونیکی و سایبری. شماره 1، صفحه 29-40، 1394.
]5[ محمدی، شهریار، غروی، عرفانه، "کاربرد تکنیکهای دادهکاوی جهت تشخیص آدرسهای فیشینگ"، کنگره ملی مهندسی برق، کامپیوتر و فناوری اطلاعات، مشهد: موسسه آموزش عالی خیام، 1392. https://www.civilica.com/Paper-CECIT01-CECIT01_555.html
]6[ معاونی, مسعود، "تشخیص حملات در بانکداری الکترونیکی با استفاده از سیستم ترکیبی فازی-راف (Fuzzy _rough)" گروه کامپیوتر دانشگاه امام رضا (ع)، 1394، http://moaveni.ir، ]دسترسی در 9/3/1396[.
]7[ ورسلیز، کارلو، هوش تجاری دادهکاوی و بهینهسازی برای تصمیمگیری، ترجمهی احمدی، عباس، محبی، آزاده، ویرایش دوم، تهران، نشر دانشگاه صنعتی امیرکبیر (پلیتکنیک تهران)، زمستان 1392.
[8] Abdelhamid, N., Ayesh, A., Thabtah, F., “Phishing detection based Associative Classification data mining”, Expert Systems with Applications 41 5948–5959, 2014.
[9] Aburrous, M., Hossain, M. A., Keshav, D., Thabtah, F., “Predicting Phishing Websites using Classification Mining Techniques with Experimental Case Studies”, IEEE Seventh International Conference on Information Technology, pp. 176-181, 2010.
[10] Abur-rous, M. R. M., “Phishing Website Detection Using Intelligent Data Minning Techniques”, Ph.D, dissertation, Dept. Computing, Bradford Univ, Bradford, 2010.
[11] Anti Phishing Working Group, Phishing activity trends report, http://www.antiphishing.org/resources/apwg-reports/apwg_trends_report_q4_2019.pdf.
[12] Aravindhan, R., Shanmugalakshmi, Dr.R., Ramya, K., Dr.Selvan C, “Certain Investigation on Web Application Security:Phishing Detection and Phishing Target Discovery”, 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS -2016), Jan. 22 – 23, 2016, Coimbatore, INDIA, Available: IEEE Xplore, http://www.ieee.org.
[13] Basnet, R. B., Sung, A.H., Liu, Q., “Feature Selection for Improved Phishing Detection”, international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, pp 252-261, 2012, Available: https://link.springer.com.
[14] Buber, E., Demir, Ö., Sahingoz, O.K., “Feature Selections for the Machine Learning based Detection of Phishing Websites”, International Artificial Intelligence and Data Processing Symposium (IDAP) IEEE, 2017.
[15] Chaudhry, J. A., Rittenhouse, R. G., “Phishing: Classification and Countermeasures”, 7th International Conference on Multimedia, Computer Graphics and Broadcasting, pp. 28-31, IEEE, 2015.
[16] Hadi, W., Aburub, F., Alhawari, S., “A new fast associative classification algorithm for detecting phishing websites”, Applied Soft Computing 48 (2016) 729–734.
[17] Khonji, M., Jones, A., Iraqi, Y., “A Study of Feature Subset Evaluators and Feature Subset Searching Methods for Phishing Classification”, Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, pp.135-144, ACM, 2011.
[18] Kohavi, Ron, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection”, Proceedings of the 14th international joint conference on Artificial intelligence (IJCAI), pp. 1137-1143, ACM, 1995.
[19] Kohavi, R., John, G. H., “Wrappers for feature subset selection”, Artificial Intelligence,Vol. 97, pp. 273-324, 1997.
[20] Lakhita, Yadav, S., Bohra, B., Pooja, “A Review on Recent Phishing Attacks in Internet”, IEEE International Conference on Green Computing and Internet of Things (ICGCIoT), pp. 1312-1315, 2015.
[21] Mohammad, R. M., Thabtah, F., McCluskey, L., “Tutorial and critical analysis of phishing websites methods”, Computer Science Review 17 (2015) 1-24.
[22] Mohammad, R. M., Thabtah, F., McCluskey, L., Phishing Website Dataset, https://archive.ics.uci.edu/ml/datasets/ Phishing+websites, 2015.
[23] Pandey, M., Ravi, V., “Detecting phishing e-mails using Text and Data mining”, IEEE International Conference on Computational Intelligence and Computing Research(ICCIC), 2012.
[24] Pandey, M., Ravi, V., “Text and Data Mining to Detect Phishing Websites and Spam Emails”, Proceedings of the 4th International Conference on Swarm, Evolutionary, and Memetic Computing, Vol. 8298, pp.559-573, 2013.
[25] PhishTank.http://www.phishtank.com,2017.
[26] rahmi A. H., isredza, Abawajy, J., “Phishing Email Feature Selection Approach”, 10th International Joint Conference of IEEE TrustCom., pp. 916-921, 2011.
[27] Sanglerdsinlapachai, N., Rungsawang, A., “Using Domain Top-page Similarity Feature in Machine Learning-based Web Phishing Detection”, Third International Conference on Knowledge Discovery and Data Mining, IEEE, pp. 17-190, 2010.
[28] Singh, P., Jain, N., Maini, A., “Investigating the Effect Of Feature Selection and Dimensionality Reduction On Phishing Website Classification Problem”, 1st International Conference on Next Generation Computing Technologies (NGCT) Dehradun, India, IEEE, pp. 388-393, 2015.