Naive Bayes Algorithm and TF-IDF for Detecting Plagiarism in Journal Articles

Authors

  • Ladysa Azzahrah Politeknik Negeri Sriwijaya
  • Lindawati Lindawati Politeknik Negeri Sriwijaya
  • Sholihin Sholihin Politeknik Negeri Sriwijaya

Keywords:

Cosine Similarity, Feature Extraction, Naïve Bayes, TF-IDF, Plagiarism

Abstract

This study examines the implementation of a combination of Naïve Bayes, TF-IDF, and cosine similarity algorithms in detecting plagiarism in journal articles. Technological advances have increased the risk of plagiarism, which poses a serious threat to the integrity of science. The purpose of this study is to explain in detail the implementation of the algorithm to detect plagiarism, as well as measure the effectiveness of its combination. The method used involves the development of a Python-based system that is implemented through a website. The dataset consists of one hundred abstracts of Indonesian-language journal articles on the Internet of Things (IoT) taken from Mendeley software. The plagiarism limit is set at a maximum threshold of 20%. Implementation is carried out through data preprocessing stages, extraction of text features using a combination of Naïve Bayes and TF-IDF, and measurement of similarity with cosine similarity. The results show that this combination of algorithms has proven to be effective in detecting plagiarism rates in journal article abstracts, providing high accuracy in measuring text similarity. The developed system is able to better extract text features through the combination of Naïve Bayes and TF-IDF, and accurately measure the similarity of text in various test scenarios. This research contributes to the development of fast and accurate plagiarism detection technology, especially in fields that require complex text analysis.

Downloads

Download data is not yet available.

Downloads

Published

2024-09-30

How to Cite

Azzahrah, L., Lindawati, L., & Sholihin, S. (2024). Naive Bayes Algorithm and TF-IDF for Detecting Plagiarism in Journal Articles. PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic, 12(2), 333–342. Retrieved from https://jurnal.unismabekasi.ac.id/index.php/piksel/article/view/9829