Machine Learning-Based Classification for Scholarship Selection

Universitas Muhammadiyah Sukabumi (UMMI) is a private university that accepts KIP (Indonesian Smart Indonesia Card) scholarship recipients every year. However, the number of KIP scholarship applicants always exceeds the available quota, necessitating a reselection process to determine the scholarship recipients. Currently, the UMMI scholarship management does not have a clear method to support decision-making in the KIP scholarship selection process. To address this issue, a classification modeling process will be conducted using machine learning algorithms, namely Decision Tree (DT) and Support Vector Machine (SVM), based on data from previous scholarship recipients. The model development approach follows the SEMMA (Sample, Explore, Modify, Model, Assess) method. It begins with the collection of KIP scholarship recipient datasets at UMMI from 2021-2022, totaling 519 data points with 16 attributes. Through exploration, the primary attributes that serve as features for modeling are identified, including Status DTKS, Status P3KE, combined parental income, and achievements. These attributes are transformed into numerical data to facilitate modeling. The K-Fold Cross-Validation results for the DT model in the case of classifying KIP scholarship recipients from the entire test dataset show an accuracy of 78.44%, a precision indicating correct model predictions of 73.11%, a recall (sensitivity rate) of 78.45%, and an F1 score of 73.20%. The modeling and validation results with SVM yield an accuracy of 80.17%, precision of 84.44%, and recall of 80.17%. The SVM model demonstrates slightly better results in terms of accuracy and precision. However, both models exhibit competitive performance in classifying KIP scholarship recipients at UMMI.


Introduction
Kartu Indonesia Pintar (KIP) Kuliah is one of the Indonesia Pintar Programs listed in the Ministry of Education and Culture Regulation No. 10 of 2020, aimed at students accepted in higher education institutions (Kemendikbud, 2020).KIP Kuliah Merdeka aims to increase the economic potential and social mobility of students from to pursue higher education (Puslapdik, 2023).Based on the KIP Kuliah registration guidelines in 2023, the eligibility criteria for KIP recipients include graduates of high school, vocational school, or equivalent institutions who graduated in the current year or a maximum of two (2) years prior; have passed the selection for new students at state or private universities in officially accredited study programs, have good academic potential but facing economic limitations with special considerations supported by valid documents.
Furthermore, the economic requirements for KIP Kuliah Merdeka recipients are students coming from low-income/vulnerable families, as evidenced by: a) students who own the Kartu Indonesia Pintar (KIP) Pendidikan Menengah; b) being listed in the Data Terpadu Kesejahteraan Sosial (DTKS) or receiving social assistance programs determined by the Ministry of Social Affairs such as PKH, Bansos Penerima Bantuan Iuran Jaminan Kesehatan (PBI JK), Bansos Bantuan Pangan Non-Tunai (BPNT); c) being listed in the maximum range of low-income/vulnerable families, up to the 3 rd decile of the Data Pensasaran Percepatan Penghapusan Kemiskinan Ekstrem (P3KE) and d) students from social welfare/orphanages.Every higher education institution has a quota for accepting KIP Kuliah students, including Universitas Muhamamdiyah Sukabumi (UMMI).The number of KIP Kuliah recipients is adjusted according to the accreditation ranking, consequently, when the number of KIP Kuliah applicants exceeds the quota, not all applicants can be accommodated, necessitating a reselection process.To ensure that those receiving KIP Kuliah are truly deserving, it is essential to conduct verification and eligibility selection for KIP Kuliah recipients (Iskandar, 2022).Currently, the selection and verification process for KIP Kuliah has been carried out through various methods such as interviews, direct visits to the prospective recipients' residences, and specialization selection for the intended study programs.However, the determination of eligibility for final acceptance has not yet been processed with a clear and transparent method (Khotimah, 2022).One method that can be employed to assist decision-makers in determining potential KIP Kuliah recipients is using machine learning, which involves classifying existing KIP Kuliah selection data from previous years (Firmansyah et al., 2019).The classification process can be performed using various algorithms, including Naive Bayes, Decision Tree, K-Nearest Neighbor, and Support Vector Machine (Elejla et al., 2019).
Several similar studies have been conducted by other researchers, such as the classification of KIP Kuliah recipients using logistic regression (Susetyoko et al., 2022) studied by Ronny Susetyoko et al., where in the result, the classification process using regression could only be used to generate numeric outcomes with input variables consisting solely of numerical values.On the other hand, Gagan Suganda conducted research on the classification of KIP Kuliah recipients using Naïve Bayes, which is a simple mathematical formula for conditional probability (Suganda et al., 2022).However, there is a drawback to this method; if the conditional probability becomes zero, the prediction probability will also be zero, leading to suboptimal prediction results (Astuti et al., 2020).Furthermore, Arfyanti conducted research using the decision tree algorithm, where the determining variables for KIP Kuliah recipients were hierarchically structured in the form of a tree.This approach allows for identifying which variables take priority in KIP Kuliah acceptance.Some of the variables used to determine the eligibility of KIP Kuliah recipients include parents' income, the number of dependents, academic achievement, home ownership, the number of vehicles, electricity payments, land area, building area, water source, test scores, etc (Arfyanti et al., 2022).Another study utilizing the decision tree algorithm in a different case was conducted by Sathiyanarayanan regarding breast cancer identification.It was mentioned that the decision tree algorithm is simpler and can compare each attribute by assigning values to each node, resulting in an accuracy rate of 99% (Sathiyanarayanan et al., 2019).Additionally, there is also the Support Vector Machine (SVM) algorithm, which is a reliable classification and regression algorithm capable of recognizing subtle patterns in complex data sets and achieving high accuracy (Zhao et al., 2020).
From those studies, researchers utilized a single algorithm.Therefore, this study aims to compare two different algorithms by adjusting specific hyperparameters appropriately to achieve more accurate and precise modeling for the classification of KIP Kuliah recipients.Hence, this research will conduct an analysis comparing the use of the decision tree and support vector machine algorithms in classifying KIP Kuliah recipients at Universitas Muhammadiyah Sukabumi.The machine learning development method employed will follow the SEMMA framework (Sample, Explore, Modify, Model, Assess), beginning with data collection, data understanding, data preprocessing, modeling, and the evaluation of accuracy and precision levels of the model.This research aims to explore and compare the performance of Decision Tree and SVM in classifying prospective KIP Kuliah recipients eligibility.The findings of this study can serve as a guideline for Universitas Muhammadiyah Sukabumi to enhance the selection process of KIP Kuliah recipients, making it more targeted and efficiently.
Additionally, this research can contribute to the advancement of science and technology in the context of data analysis and decision-making.

Research Method
This research will employ the SEMMA (Sample, Explore, Modify, Model, and Assess) methodology.The research process begins with dataset selection, exploration Piksel 11 (2): 447 -460 (September 2023) and visualization of the dataset, also dataset modification to prepare it for modeling, followed by modeling using machine learning algorithms, and model accuracy evaluation (Suwitono & Kaunang, 2022).Model validation will be performed using K-fold crossvalidation to measure the model's performance more reliably and reduce the risk of overfitting on the training data.Model performance evaluation will be measured using a confusion matrix table, which can calculate model evaluation metrics such as accuracy, precision, recall, and F1 score.Accuracy represents the ratio of correctly classified instances to all classified instances (Hozairi et al., 2021).Recall measures how successfully the algorithm recognizes a particular class, while precision measures the accuracy of the classification results from the entire dataset.The F1 score combines recall and precision, representing the overall performance of the method (Latifah et al., 2019).Figure 1 shows the research stages that will be conducted.The research was conducted beginning with 1) the collection of the KIP Kuliah recipients' dataset for the past 5 years for modeling, 2) Dataset exploration through visualization and description, which is the process of understanding the dataset to select data suitable for modeling, 3) Data modification: variable selection, data cleaning, and data transformation.This process was carried out to ensure that the dataset to be modeled has been verified, with actions such as feature selection, data cleaning, and data transformation, 4) Modeling the dataset using machine learning algorithms, namely decision tree and SVM.In classifying data using the SVM method, the kernel function K(xi, xd) is utilized.The kernel function to be employed is as described in equation ( 1) below (Pradnyana et al., 2021): A decision tree learns from a set of independent data, depicted in a tree diagram, using the "divide and conquer" approach (Wahyuningsih & Utari, 2018).Equations ( 2) and (3) represent the data equations within the tuple D.

Data Exploration
The data exploration stage aims to describe the KIP Kuliah dataset through data description and visualization.The raw dataset consists of 16 attributes containing nonnumeric data objects.In the implementation of the decision tree and SVM algorithms, numerical data is required, so data transformation is necessary.Additionally, not every attribute will be used in the research.The attributes that will form the dataset are DTKS Status, P3KE Status, the combined income of the father and mother, and the label will be Scholarship Status.

Data Modification
The initial dataset obtained still contains raw data as per the input from the KIP Kuliah registration application, so adjustments are needed to process it using the decision tree and SVM algorithms.Some of the attribute modifications made include: 1.The attributes "Status DTKS" and "Status P3KE" which contain the value "Belum Terdata" are changed to "0" and for "Terdata" they are changed to "1".The attribute modification process is illustrated in Figure 3. 3. The "Penghasilan Orang Tua" attribute, as mentioned in point number two, will be divided into three categories: 1) "Low" for incomes below Rp 2,000,000, 2) "Medium" for incomes between Rp 2,000,000 and Rp 4,000,000, and 3) "High" for incomes above Rp 4,000,000.All values are in monthly income.The attribute modification process is illustrated in Figure 5.The "Prestasi" attribute, which contains descriptions of achievements, will be grouped into "Berprestasi" which will be changed to "1" and "Tidak Berprestasi" which will be changed to "0".The attribute modification process is illustrated in Figure 6.status, academic achievement, and parental income, ultimately predicting whether a student qualifies for the scholarship.SVM, on the other hand, aimed to find the optimal hyperplane that effectively separates the two classes which are KIP Kuliah recipients and non-recipients, with the goal of maximizing the margin between them.Furthermore, model evaluation is also conducted to assess the performance and capabilities of the decision tree and SVM models in making predictions or classifications of KIP Kuliah recipients.Model evaluation will utilize the confusion matrix approach, followed by validation using the K-Fold cross-validation technique.
Figure 1.Steps in this Research Figure 2. Dataset of KIP Kuliah Scholarship Recipients at UMMI for the year 2021-2022

Figure 3 .
Figure 3. DTKS and P3KE Status Attribute Modification 2. The attributes "Penghasilan Ayah" and "Penghasilan Ibu" which contain values such as "No Income", "-", and various income ranges, will be merged to create a new attribute called "Penghasilan Orang Tua".The attribute modification process is illustrated in Figure 4.
Figure 5. Parents Income Attribute Grouping

Figure 6 .
Figure 6.Student Achievement Atribut Modification From the modified dataset, a clearer dataset distribution can be obtained, as illustrated in Figure 7.

Figure 9
Figure 9 presents the modeling and validation results of the decision tree.The K-Fold Cross-Validation results for the Decision Tree model in the case of classifying KIP Scholarship recipients can be explained as follows: