There are many factors involved in identifying the optimal number of clusters. Data scientists often employ techniques such as the Elbow method or metrics such as the Silhouette Coefficient to decide on the best number of clusters. This is for algorithms such as KMeans which require the user to input the number of clusters in advance.
On the other hand, some clustering algorithms claim that they automatically find the optimum number, but this is heavily dependent on the data.
In practice, what we recommend and have found useful is to
- understand the data as much as possible to get an idea of the topics
- try different numbers of clusters and quickly check the results under Explorer
- increase or decrease the number of clusters based on step 2
- employ the merge functionality of the Explorer dashboard to combine clusters that are conceptually close to one another
Updated 8 days ago