Tag Archives: Python basic operation

Python Kmeans Error: ConvergenceWarning: Number of distinct clusters (99) found smaller than n_clusters (100).

The following errors occur when using the kmeans method for clustering. My default setting is 100 categories, and there are more than 100 data. The following errors are reported in the clustering process:

ConvergenceWarning: Number of distinct clusters (99) found smaller than n_clusters (100). Possibly due to duplicate points in X.

First locate the error code:

kmeans = KMeans(n_clusters=k, random_state=2018)
kmeans.fit(XData)
 pre_y = kmeans.predict(XData)

It is probably one of these three items. Finally, it is found that the prompt appears in the fit() function. Use the try method to grab the error, locate the breakpoint for debugging, and grab the ConvergenceWarning error:

    # Build the clustering model object
    kmeans = KMeans(n_clusters=k, random_state=2018)
    # Train the clustering model
    try:
        kmeans.fit(XData) # If the data has duplicates, the data will be smaller than the K value when fitting, so the K value needs to be updated
    except ConvergenceWarning:
        print("Catch ConvergenceWarning,k={}\n".format(k))
    except:
        print("k={}\n".format(k))

    # Predictive Clustering Model
    pre_y = kmeans.predict(XData)

After checking, the outgoing line is caused by repeated XData data, so the K value is determined in the early stage, and the problem can be solved by repeating it here
previous K value determination Code:

XData = np.array(X)
if(XData.shape[0]<=k and XData.shape[0]!=0):
        print("XData size:",XData.shape[0])
        k=XData.shape[0]//2
        print("K:",k)
        if(k<=0):
            return result

The changed code is de duplicated by using set

XData = np.array(list(set(X)))#2022.3.22 zph comes with a de-duplication effect
if(XData.shape[0]<=k and XData.shape[0]!=0):
        print("XData size:",XData.shape[0])
        k=XData.shape[0]//2
        print("K:",k)
        if(k<=0):
            return result

The above is hereby recorded for reference by those who have the same problems.