Classifying Half-Unemployment Levels in Indonesian Provinces: A K-Means Approach for Informed Policy Decisions

Half-level unemployment means that they work part time and are not fully employed. Increasing the half poverty rate from year to year will cause problems in the lives of these people. The problem that occurs with the increase in the level of half-darkness is that it is difficult for the government to prioritize which areas should be prioritized to overcome these problems. So by increasing the level of darkness can cause negative things to happen. Therefore, it is necessary to classify the underemployment rate data obtained from public data sourced from data.go.id using the popular grouping method, namely K-Means. The purpose of this grouping is to identify and classify provinces with many half-poverty levels, so that this classification will assist the government in making decisions for dealing with people with the half-poverty criteria. The results obtained from grouping the data from the calculation of the first to the eighteenth iteration into three groups, namely the large group or C1, medium or C2 and little or C3 at the half rain level. Group C1 has 17 provinces with half the poverty rate, while C2 only has 2 provinces and C3 has 16 provinces with half the poverty rate. From these results it can be considered for the Indonesian government in taking policies to reduce the poverty level by half. Especially the C1 group needs to be prioritized in creating jobs for the people of the province.


Introduction
The underemployment rate is a crucial issue in the modern economy.
Moreover, the phenomenon of unemployment in every country must exist, especially in China which has a major problem of unemployment (Wang & Zheng, 2009).The phenomenon of underemployment rates refers to situations in which individuals who should have a full-time job find themselves only having part-time jobs or jobs that do not match their qualifications or desire.Underemployed workers are those who work under normal working hours where their working hours are less than 35 hours a week Suhardjono,Hari Sugiarto,Dewi Yuliandari,Adjat Sudradjat,Luthfia Rohimah 436 Piksel 11 (2): 435 -446 (September 2023) and are still looking for work or are still willing to accept work, which is called forced underemployment.The things that cause a lot of unemployment are basically because the number of job seekers or the number of applicants is not proportional to the number of jobs in the province of Indonesia and will cause poverty and increase the crime that occurs in the country (Saputra et al., 2019).
With these things a data mining process is needed to get knowledge that is not visible in the database (Nasyuha et al., 2021).The knowledge gained from the initial stage of preprocessing is the recognition of data patterns (Purnama & Rahayu, 2022).
From this pattern recognition it greatly influences the cluster results that have been carried out on the dataset with different centroid results (Ridwansyah et al., 2022).In this context, grouping or clustering can help in understanding the pattern of underemployment rates based on the number of people in percent each year, starting from 2018 to 2021.Where the data taken is public data sourced from data.go.id with existing provinces in Indonesia.The high level of unemployment can also be caused by information technology factors, because with the advancement of technology in a country, many jobs will be taken over by this technology (Shahkooh et al., 2008).And there are still many workers who are not accommodated in the world labor market (Rahmat et al., 2023).The problem that is happening now is that every year there is an increase in the unemployment rate and an increase in the number of underemployment in provinces in Indonesia.Handling the problem of unemployment requires a careful and innovative approach by identifying which provincial groups have the highest rates of underemployment.So that the government can develop more effective solutions to reduce the underemployment rate by targeting groups that are vulnerable to this case.
The use of clustering methods, especially the K-Means algorithm, as an approach to reducing unemployment rates has become an interesting research topic in recent years.Data grouping is an important task, by grouping based on data similarity from very large data (Hossain et al., 2019).This approach allows identifying groups of individuals who share similar characteristics, which in turn can provide valuable insights in designing more effective solutions to reduce unemployment rates and improve people's quality of life.Grouping using K-Means can be interpreted as a data mining model where grouping the data into a system and carrying out the process by mapping without the use of supervision (Malik et al., 2018).K-Means mapping with unsupervised learning is a simple method of problem solving (Santiko et al., 2018).
Improving the quality and accuracy of K-Means depends on selecting the seed centroid during the initialization process (Esteves et al., 2013).Several related studies have discussed important aspects of using K-Means to address the unemployment problem.
Research on unemployment using back-propagation neural network (BPNN) via the principal component analysis (PCA) method obtained good results and accuracy (Wang & Zheng, 2009).And there is also research using the resilient method because it has the advantage of being able to predict data based on past data or previous data (Saputra et al., 2019).
The aim of this research is to group the underemployment rate in Indonesia by applying the clustering method to the underemployment rate data and identifying groups that have similarities in criteria and proportion of population to the underemployment rate data.To identify groups of underemployment rate data, a K-Means model is needed to read underemployment rate data for each year of the period by carrying out data modeling processing.This research will use the clustering analysis approach, which is an unsupervised machine learning technique for grouping data.
It is hoped that the results of this research can provide guidance for more effective economic policies and unemployment reduction programs that are useful as a reference for the government in order to increase the level of utilization, usefulness and productivity of workers, as well as reduce the increase in underemployment.

Research Method
The method used is the grouping method where the unemployment rate data uses non-hierarchical clusters (Oktavia et al., 2020).By determining the number of clusters produced, there were three clusters in this research (Sugiono et al., 2019).
Where for each cluster the number of samples in each subset of clusters is given a value and each cluster has an average value, after that compare the average value with the previous grouping of initial values.If the average value is compared with the initial grouping value, the value does not change, so the grouping is constant with the center.If the comparison value changes then continue with the next iteration by determining the average value at the midpoint of the grouping (Alfianti et al., 2021).
With the criterion of the sum of squares the average value can be stated.
With the algorithm model of the underemployment rate using the K-Means algorithm, you can see the flowchart in Figure . 1 The collection of the underemployment rate dataset is carried out by taking data from public data uploaded on the website data.go.idwhich can be downloaded for free by everyone.After data on the level of underemployment is collected from 2018-2021, it will be processed using the K-Means method.
Determine 3 clusters in the underemployment rate dataset based on data that has been grouped into each cluster, after that determine the number of clusters that exist.Cluster determination does not determine how much data there is.
Choose the center point K randomly with the clusters that have been selected totaling 3 clusters from the underemployment rate dataset, after that the selected results will be used for grouping the data.The results of the average value of the cluster subset are obtained if after comparing the average value with the initial value of the grouping, the value does not change, so the grouping is constant with the center and will produce the average value of the subset of clusters.
Underemployment rate data that is used for testing with the K-Means method has gone through the data pre-processing stage and is ready to be processed in conducting K-Means testing.The data is grouped on an annual basis from 2018-2021 with a sample of 35 provinces in Indonesia and 3 clusters, namely few, medium and many in determining the class level of underemployment.The amount of data with a calculation of percent per year according to the parameters that have been determined can be seen in table 1, sourced from data.go.idThe results of the data in table 2 are data on the level of underemployment which is ready to be tested using the K-Means algorithm.Once the data is ready to be tested, the data will be determined into 3 clusters and the cluster center will be determined by selecting the cluster center point with the initial K centroid chosen  From table 2 the number of clusters is three clusters, where the initial cluster is taken from the first data or K1, the second cluster is taken from the eighteenth data or K18 and the third cluster is taken from the twenty-first data or K21.From the data that has formed clusters, initially the distance between two data points at each centroid will be calculated using the Euclidean distance calculation which can be seen with the implementation in table 3.In table 3 it can be seen the results obtained from each first cluster or C1 with the smallest value range from 0.00-15.65,the second cluster or C2 with the smallest value range from 0.00-21.52,the third cluster or C3 with the smallest value range from 0.00-12.85.From the results of the first iteration, the average value for each cluster is as follows.From the average results of the first iteration, it will go to the second iteration by updating the centroid point value until the value does not change again until it is finished.In the calculation results of the eighth iteration, by comparing the average value that has not changed in the grouping at the beginning, namely the seventh iteration, the calculation process stops.The results of the calculation values in the eighth iteration can be seen in table 5. From the results of the calculation of the eighth iteration, the average value is obtained in table 6.

Results and Analysis
Underemployment rate data that is used for testing with the K-Means method has gone through the data pre-processing stage and is ready to be processed in conducting K-Means testing.The data is grouped on an annual basis from 2018-2021 with a sample of 35 provinces in Indonesia and 3 clusters, namely few, medium and many in determining the class level of underemployment.The amount of data with a calculation of percent per year according to the parameters that have been determined can be seen in table 6, sourced from data.go.id.The results of the data in table 7 are data on the level of underemployment which is ready to be tested using the K-Means algorithm.Once the data is ready to be tested, the data will be determined into 3 clusters and the cluster center will be determined by selecting the cluster center point with the initial K centroid chosen randomly based on the data in table 1.The following are the results of the centroids that have been formed which can be seen from table 2. From table 2 it can be seen that the number of clusters is three clusters, where the initial cluster is taken from the first data or K1, the second cluster is taken from the eighteenth data or K18 and the third cluster is taken from the twenty-first data or K21.
From the data that has formed clusters, initially the distance between two data points at each centroid will be calculated using the Euclidean distance calculation which can be seen with the implementation in table 8.In table 9 it can be seen the results obtained from each first cluster or C1 with the smallest value range from 0.00-15.65,the second cluster or C2 with the smallest value range from 0.00-21.52,the third cluster or C3 with the smallest value range from 0.00-12.85.From the results of the first iteration, the average value for each cluster is as follows.From the average results of the first iteration, it will go to the second iteration by updating the centroid point value until the value does not change again until it is finished.In the calculation results of the eighth iteration, by comparing the average value that has not changed in the grouping at the beginning, namely the seventh iteration, the calculation process stops.The results of the calculation values in the eighth iteration can be seen in table 10.From the results of the calculation of the eighth iteration, the average value is obtained in table 11.In the final results shown in table 12, the sub-districts that belong to the first cluster or C1 are broken down to the sub-district group that has the least darkness, while the second cluster is the sub-district group that is half asleep, and for the third cluster or C3, the sub-district group that is many levels of darkness.
Based on the results of calculations on underemployment rate data with the initial centroid value distance variable on data from 35 Indonesian provinces from the first to the eighth iteration.Each cluster has as many members as the first cluster or C1 has 17 provinces with a high unemployment class category, the second cluster or C2 has 2 provinces with a medium unemployment class category.The third cluster or C3 consists of 16 provinces with a low unemployment rate category.By testing the K-Means model, the government can provide considerations and solutions needed to classify and determine which provinces the government prioritizes so that it can reduce the level of underemployment by creating ideal job opportunities according to community needs.

Conclusion
From the results of calculations carried out by the clustering process with provinces in Indonesia using the K-Means method with an average percent value at half the rainfall rate in 2008 of 8.03%, in 2019 of 7.60%, in 2020 of 10.90, in 2021 of 9.51 %.As well as the results of the calculations it can be categorized that out of 35 provinces, around 48.57% of provinces in Indonesia still lack jobs for people who are unemployed or at the half poverty level.The existence of this grouping is a good source of data to be used by the government in making policies or creating jobs in provinces that really need jobs, so that it will make the half unemployment rate decrease.
Figure 1.Flowchart of the K-Means Model Unemployment Rate

Table 1 .
Sample Data on Underemployment Rate randomly based on the data in table 1.The following are the results of the centroids that have been formed which can be seen from table 2.

Table 2 .
Randomly Select K Centroid Points

Table 3 .
First Iteration Calculation Data Results

Table 5 .
Eighth Iteration Calculation Data Results

Table 6 .
Results of Average Calculation Data for the Eighth Iteration

Table 8 .
First Iteration Calculation Data Results

Table 9 .
Data Results for First Iteration Average Calculation

Table 10 .
Eighth Iteration Calculation Data Results

Table 11 .
Results of Average Calculation Data for the Eighth Iteration change, so the grouping is constant with its center and will produce the average value of the subset of clusters.From these results, the data on the level of underemployment in provinces in Indonesia has clusters in each as follows: Suhardjono, Hari Sugiarto, Dewi Yuliandari, Adjat Sudradjat, Luthfia Rohimah 444 Piksel 11 (2): 435 -446 (September 2023)

Table 12 .
Final Results and Patterns of Centroid Distance and Cluster Center of