Naive Bayes Algorithm and TF-IDF for Detecting Plagiarism in Journal Articles
Keywords:
Cosine Similarity, Feature Extraction, Naïve Bayes, TF-IDF, PlagiarismAbstract
This study examines the implementation of a combination of Naïve Bayes, TF-IDF, and cosine similarity algorithms in detecting plagiarism in journal articles. Technological advances have increased the risk of plagiarism, which poses a serious threat to the integrity of science. The purpose of this study is to explain in detail the implementation of the algorithm to detect plagiarism, as well as measure the effectiveness of its combination. The method used involves the development of a Python-based system that is implemented through a website. The dataset consists of one hundred abstracts of Indonesian-language journal articles on the Internet of Things (IoT) taken from Mendeley software. The plagiarism limit is set at a maximum threshold of 20%. Implementation is carried out through data preprocessing stages, extraction of text features using a combination of Naïve Bayes and TF-IDF, and measurement of similarity with cosine similarity. The results show that this combination of algorithms has proven to be effective in detecting plagiarism rates in journal article abstracts, providing high accuracy in measuring text similarity. The developed system is able to better extract text features through the combination of Naïve Bayes and TF-IDF, and accurately measure the similarity of text in various test scenarios. This research contributes to the development of fast and accurate plagiarism detection technology, especially in fields that require complex text analysis.