Clustering Customer Data Using Fuzzy C-Means Algorithm

The pattern of the service industry is influenced mostly by economic growth. When economic growth rises, the economic activity will also grow as in the case of insurance activities. One of the assets owned by an insurance company is the customer, hence the existence of a loyal or potential customer should be maintained by the insurance company. This study focuses on clustering or grouping the existing customer data in insurance companies using the Fuzzy CMeans (FCM) algorithm. This study uses data from the company for analysis and the results can be used as a basis for insurance companies in making decisions, especially those related to further insurance marketing to customers who have participated in insurance or who are still actively registered in payment insurance. Fuzzy C-Means can be used for clustering the customer Nurfaizah, Fathuzaen 2 Piksel 9 (1): 1 14 (March 2021) datasets. It obtained 3 clustering results using Partition Coefficient (PC) in determining the validity index and the centers value was ranged from 0.5 to 1.0.


Introduction
According to the Indonesian insurance law no. 40 year 2014, insurance is an agreement between two parties, namely the insurance company and the policyholder, which forms the basis for receiving premiums by the insurance company as compensation for providing compensation to the insured or policyholder for loss, damage, costs incurred, loss of profits, or legal liability to a third party that the insured or policyholder may suffer due to an uncertain event or provide payments based on the death of the insured or payments based on the life of the insured with benefits of a predetermined amount and/or based on the results of fund management.
Another kind of insurance system, the Sharia Insurance is a collection of agreements, which consists of an agreement between a sharia insurance company and a policyholder and an agreement between policyholders with managing contributions based on sharia principles to help and protect each other by providing compensation to participants or policyholders for losses, damage, costs incurred, loss of profits, or legal liability to third parties that may be suffered by participants or policyholders due to an uncertain event or providing payments based on the death of the participant or payments based on the participant's life with benefits of which the amount has been determined and/or based on the results of fund management (Indonesian insurance law no. 40 year 2014).
The operation of an insurance company cannot be separated from the debtor or insurance customer which is a very important asset in a financial institution or bank. The insurance company pays attention to customer profiles because it is a very important part of bank documentation and must be ready if needed at any time. This is the basis for prospective customers to fill out an identity form that contains the condition of an insurance customer.

3
The customer profile is used as a company reference to find out more about work, personal data, family data. Completeness of customer data makes it easier for the insurance company to collect data and further the insurance company can be the basis for determining or classifying potential and nonpotential customers.
The c-means algorithm is one of the algorithms used to cluster data. This method can classify incomplete data using the maximum likelihood criteria and clustering results can provide decision support for companies to increase the effectiveness of raw material storage (Zhang et al., 2016). In addition, C-Means can be used for data processing (Agustini, 2019).
This study focuses on grouping or clustering insurance customer data using the C-Means algorithm. Previous research grouped and produced optimal results in clustering food products in restaurants using the C-Means algorithm (Agustini, 2019). Other research conducted by Rahakabhaw implemented Fuzzy C-Means in determining scholarships by grouping data of students who are eligible to receive scholarships and students who are not eligible for scholarships. Fuzzy C-Means can also be used as a method and as a determinant of eligibility for scholarship acceptance (Rahakbauw et al., 2017).
Similar research was also conducted by Rustiyan who implemented the Fuzzy C-Means algorithm to see the problem of mandatory savings for cooperative members and obtained the results of grouping the cooperative customers for each district/city where they live in the payment of Mandatory Savings (Rustiyan R & Mustakim, 2018). Another research theme using the C-Means method was also conducted by Rismanto (Rismanto et al., 2017) which produced student grouping data based on the number of lecture attendance levels for follow-up actions and determined for students to supervise or not.
Various studies related to using the C-Means algorithm have been conducted by previous researchers. The focus of this study is the implementation of the C-Means algorithm for clustering insurance customer profile data. Therefore, the potential and non-potential customer data groupings can be obtained and beneficial as well to other types of insurance in another insurance company.

Clustering
Clustering is a technique that is well known and widely used to group data or objects into groups of data based on similar characteristics (Han et al., 2012;Sari & Suranti, 2016). Groups can be seen as subsets of data sets, one of the possible classification methods of grouping can be adjusted whether the part is fuzzy. The classical grouping method (hard clustering) is based on classical set theory, which determines that an object can be a member or not a member of Fuzzy groupers allow an object to be a member of several groups at once with different degrees of membership. The degrees of membership are between the ranges 0 and 1. Thus, dataset X can be partitioned into c fuzzy parts.
Data clustering is an unsupervised data mining method. There are two types of data clustering that are often used in data grouping processes, namely hierarchical (hierarchical) data clustering and non-hierarchical data clustering.
Clustering is a way of entering an observed pattern into an unknown pattern class and is called a pattern cluster Fuzzy clustering is a technique to determine the optimal cluster in a vector space based on the normal Euclidean form for the distance between vectors. Fuzzy clustering is very useful for fuzzy modeling, especially in identifying fuzzy rules.
There are several data clustering algorithms, one of which is Fuzzy Clustering C-Means. The basic concept of FCM is, first, to determine the center of the cluster, which will mark the average location for each cluster. By repeating the cluster center and degree of membership of each data point, it can be seen that the cluster center will move towards the right location. This iteration is based on minimizing the objective function describing the distance from a given data point to the center of the cluster which is weighted by the degree of membership of the data point (Kusumadewi & Purnomo, 2004). The Fuzzy C-Means algorithm is explained as follows: Clustering Customer Data Using Fuzzy C-Means Algorithm (1) e. The objective function is used as a looping condition to get the right cluster center. So that we get the data tendency to enter which cluster at the final step.
f. Calculate the objective function in the t-iteration Pt with the following formula h. Calculate the change in the partition matrix using the following formula: ( 3) where: i = 1,2,… n and k = 1,2, .. c.
To find the change in the μik partition matrix, the reduction in the value of the fuzzy Xij variable is done back to the center of the cluster Vkj and then it is squared. Then add up and then raise the power to -1 / (w-1) with weight, w = 2, the result is that each data is raised to the power of -1. After the calculation process is carried out, normalize all new membership degree data by adding up the new membership degrees k = 1,… c, the result is then divided by the new membership degrees. This process is carried out so that the new membership degree ranges between 0 and not more than 1 and then check the stop condition, if :( | Pt -Pt-1 | <ξ) or (t> maxIter) then stop the process, otherwise: t = t + 1, and repeat step 4.

Research methods
The research stage is needed as a framework and guide for the research process, so that a series of research processes can be carried out in a directed, 7 (Fadillah, 2015;Shearer, 2000). The CRISP-DM stage used can be as shown in Figure 1.

Figure 1. Research Stages
In more detail, the CRISP-DM research stage is explained as follows:

a. Business Understanding
This stage is the stage in identifying business processes and objectives that are used as a basis for determining the patterns to be looked for in the data mining process more specifically related to data clustering using C-Means.

b. Data Understanding
After identifying the business process, the next stage is understanding the data needs related to the achievement of the objectives in the previously determined research. The process undertaken is to understand the data and retrieve some of the data needed in this study. The data to be used is insurance customer data.

c. Preparation Data
This stage is the stage where the customer profile data collection process is converted into a data set which will then be processed in pre-processing data.
d. Modeling After the data goes through the pre-processing and normalization stages, in this stage the modeling stage is carried out using the C-Means algorithm.

e. Evaluation
The results of clustering that have been processed in the modeling stage are then tested for the validity level to determine the optimal number of clusters.

f. Deployment
This stage is the stage of applying the model that has been generated and compiling a clustering report.

Business Understanding
Insurance company "XYZ" is an insurance company that has products by looking at the amount of premium that customers take. The existing products include the Prosperous Investment Fund, the Healthy and Prosperous Fund, and the Prosperous Investment Fund. This study will focus on clustering using the FCM algorithm. The understanding of the business carried out in this study looks at the tendency of customers to take products and the amount of premiums.
a. Determining business objectives is done by identifying customer data patterns on products and premium deposits in a certain period.
b. Assessing the situation, the insurance data management carried out by each unit is then put together in one report per company month.
c. Determining the purpose of data mining, this study aims to see the patterns of customers on product selection and grouping the tendency of the premiums taken by customers

Data Understanding
The data to be used is insurance customer data taken from customer report data for the entire unit in the 2019 period. Data collection and data selection determination are also based on interviews conducted. The data used is new customer data for the month, the dataset used is 301 customer data. Table 1 below is an initial dataset table that has been manually coded for policy number data and product types using Excel.

Data Preparation
The dataset that has been owned is then changed the format from excel to csv data file in order to simplify the data processing process in the program.

Modeling
After the data goes through the pre-processing stage, the data normalization stage is carried out through the python program as in Figure Table 3 below is the initial display when data is inputted into the program.
One of the results of the normalization process is shown in table 3 below Table 3. Results of Data Type Normalization The next stage is data modeling which is done by inputting the CSV dataset into the program and carrying out the normalization stage. Figure 3 below is a step in modeling the FCM algorithm. The number of clusters is determined as 3 clusters with a maximum interaction of 100.

Evaluation
Based on the validity test of the interactions carried out, the data clustering from the dataset used is shown in Figure  The data processing process is also carried out by assessing the validity index using the Partition Coefficient (PC) to measure the amount of overlapping between the clusters formed and the number of the best clusters aimed at the highest PC value (Haqiqi & Kurniawan, 2015). Figure 5 shows validity index results.
The center having highest PC value was center 2 with a validity value of 0.97. A value close to 1 indicates that the quality of the clusters formed is getting better (Astria & Suprayogi, 2017).

Deployment
The clustering results show the grouping of members based on the cluster center formed and Fuzzy Clustering means grouping based on the degree of membership. Based on the validity test of the interactions carried out, the data clustering from the dataset used is obtained as follows in Figure 6.
There are 3 clusters, namely cluster 0, cluster 1 and cluster 3 with 3 cluster center points, namely at point 0.5 to 1.0. The cluster formed based on the dataset used is cluster 0, which is a cluster that has a gross premium of 100,000 to 200,000, while cluster 1 is a cluster that has a gross premium> = 20,000,000, and cluster 2 is a cluster that has a gross premium> = 200,000 to 15,000. .000.

Conclusion
The Fuzzy Clustering Means (FCM) algorithm can be used to cluster insurance customer data. The clustering produced 3 clusters from 300 datasets which were grouped based on the degree of membership of each data, while the clusters formed are cluster 0, cluster 1 and cluster 3 with a cluster center value of 0.5 to 1 which is obtained from the highest validation index and is close to number 1. 13