Professional Clustering Based on the Graduates Profile Using K-Means Method

The Information Systems department of Pelita Indonesia Institute of Business and Technology produces graduates with education knowledge to face the professional work, but the majority of students after graduating do not work according to their graduate profiles/educational background. This research aimed to help students in determining the appropriate graduate profile. The proposed system was built using the K-means method for the classification process of graduate profiles; hence, the results can be used as recommended profession to be taken. The data used in this study is the data of students of class 2016, and these results are compared with their current professional record data with the aim of knowing the percentage of professional suitability obtained with the current profession. After this research is tested, the results of the classification of the graduate profile can be obtained where there are 10 students in the administrator database cluster, 11 students of the web design and developer cluster, one student of the cluster information system manager, Susi, Alyauma Hajjah 76 Piksel 9 (1): 75 88 (March 2021) and 8 students of the cluster system analysis. The percentage of suitability was 43.33%. This program is designed using the PHP programming language.


Introduction
The development of education in Indonesia has increased rapidly. An important things that becomes a success factor of education in a university is the profession skill that is obtained after graduating. Undergraduate graduates from higher education are expected to work according to their educational background. However, the level of conformity between the competence of undergraduate graduates and the needs of the world of work is still low.
Currently, most educational institutions are taking advantage of advances in information and communication technology in order to improve the quality of services provided. Good data organization and management will provide benefits for educational institutions, especially in the efficiency of existing resources. In addition, the speed in producing and distributing information greatly supports the quality of service. After graduating from a college, getting a diploma certificate is a temporary success, because it is just one requirement to apply for jobs. The biggest challenge for graduates is the unavailability of information as a support for choosing any profession to work in accordance with the majors they have and according to the value of the courses obtained To solve this problem, the K-means clustering method will be implemented. The input used in the K-means method is the value of each related subject, graduate profile, and student data. After getting the results of the professional classification, they will then be compared with the professions undertaken by the graduates with a tracer study. Tracer study is a method used by some universities to obtain feedback from graduates. This research was conducted to help students or graduates to determine their profession using the k-means method. With the proposed system, a college can get more accurate classification of the student profession.

77
Various studies related to the k-means method have been carried out, for example a study conducted by (Kusumantara et al., 2019) where the K-Means method is used to determine student achievement groupings, a research by (Dzulhaq & Imani 2015) in determining the concentration for students using the sample as the main requirement to obtain data that describes as the focus of the research using the Slovin formula. The K-Means method is the well-known clustering algorithm and has been widely used in various fields because of its simplicity, easy to implement and has the ability to cluster large data (Agusta, 2007).

System Development Life Cycle
Data are collected from the IBT Pelita Indonesia system which is currently running. The professional classification system still runs manually, no online system is used to help campuses or students in finding professional classifications based on graduate profiles, hence, students face problems in finding their appropriate professions based on graduate profiles. the problems experienced at IBT Pelita Indonesia, i.e. the output, the forms and types of data, information workflow, and procedures. The boundaries and the scope of the system is described as well as data management, distribution, and data security in Information System process.
In the third phase, the proposed system is divided into sub-phases consisted of a sub-phase, i.e. determination of the computer-based system criteria, a structuring sub-phase, and designing an online computer-based system model as an option.
In the fourth phase, a description of all the functions have been selected in the analysis stage are described separately from certain computer specifications, based on functions and detailed specifications of all system elements (data, process, input and output). In this logic design, coding or rules are made to determine the profile of graduates and also the appropriate profession.
In the fifth phase, changes are made from the logic design stage to the use of certain technologies in detail, such as designing user-friendly data input forms and reports (login forms, data input, etc.) as well as some reports needed in this system.
The sixth phase of implementation can be interpreted as a process to ensure the implementation of a policy and the achievement of that policy. In this phase, a system including the operating system installation is carried out in the field of study section, using the proposed application program, and provide training to the potential users.
The last stage, a periodic maintenance to check whether the system is running well should be conducted. The process of monitoring, evaluation, and changes (improvements) to the system are carried out if needed. Latest version of software or with updates for documentation, training and support. Changes will be made if there is an error, so that the software and hardware must be adjusted again to accommodate the desired changes.

K-Means Methods
The K-Means algorithm is an iterative clustering algorithm that partitions data sets into a predefined number of K clusters. The K-Means algorithm is a. Determining the number of clusters (k) to be formed.

Professional Clustering Based on the Graduates Profile Using K-Means Method
(1) b. Find k cluster center point (centroid) early done in a manner random.
Determination on centroid early done random or random of available objects as many as k cluster, to count centroid the next cluster, the following formula is used: c. Calculates the distance from each object to each -respectively centroid from each cluster use Euclidean Distance, with the following formula: d. Allocate from each object into centroid the closest. For allocating objects into each-respectively cluster at the time iteration is generally done with the hard k-means way in which each object is explicit stated as a member cluster by measuring the proximity distance its nature to the center point cluster the e. Do iteration and then determine the position of the centroid by using equations.
f. Repeat step three if in position centroid new is not the same

Data Processing
At this stage, data processing is carried out to get results that useful in determining the classification of the profession. In general, the K-Means algorithm has some typical steps as follows (Sibuea & Safta, 2017): i) Determine the number of clusters (k), ii) Randomly determine the centroid, iii) Piksel 9 (1): 75 -88 (March 2021) Calculate the centroid distance, iv) After iterating, does the centroid change, if yes, recalculate the data distance from the centroid. If not, then it's done.
Iteration is stopped, and v) Grouping the data based on the closest distance.
Therefore, the first thing to do is determine the number of k in this study to determine k = 4 and the determination of the centroid. At the first time, the centroid is done randomly and then followed by calculating the distance between the centroids.  3  3  3  3  3  3  3  C1   57011 3  3  2  2  2  3  2  3  C2   57007 3  3  3  3  3  3  3  3  C3   57016 3  3  3  3  3  3  2  3  C4 Source: Research Result (2021) In this study the K-Means method was used to allocate data into a cluster that is closest to the center point of each cluster. To find out which cluster is closest to the data, it is necessary to calculate the distance of each data from the center point of each cluster using equation (3). The first iteration process can be seen in the Table 2.  Because in the 2nd and 3rd iterations the cluster distance did not change, the iteration was stopped and the final results obtained (Table 5).

Professional Classification
The profession used in this study has been adjusted to the profile of graduates on Information Systems Study Program IBT Pelita Indonesia. The professional classification based on the graduate profile will be shown in the following table. Below will be displayed the results of the suitability of the profession obtained from calculations with their current profession through a tracer study.  Table 7 shows the 43.3% of professional data in accordance with the calculation results.

Conclusion
The tests carried out in this study shows 3 times iterations. Based on the results of the closest distance cluster, if it shows the center 1 then it is directed to the information system manager profession, center 2 for system analysis, center 3 for database administrators, and center 4 for web design and developer. After processing the data, the proposed system successfully classifies the graduate profile, e.g. the database administrators with its corresponding profession (database administrator, data analyst, administrative staff, network security and database coordinator), web design developer and its corresponding professions (UI/UX developer, game developer, web developer, programmer, application developer, and teacher), etc. Professional data that has been obtained compared to their current profession using the tracer study method shows as much as 43.3% of professional data is in accordance with the calculation results. The cluster results are also influenced by the value of the initial centroid used and the amount of data used, and the difference in retrieval.

Author Contributions
Susi the topic; Susi and Alyauma Hajjah conceived models and designed the experiments; Susi and Alyauma Hajjah the algorithms; Susi and Alyauma Hajjah analysed the result.

Conflicts of Interest
The author declare no conflict of interest.