Clustering groups items so that those in the same group/cluster have meaningful similarities (i.e. specific features or properties). Clustering facilitates informed decision-making by giving significant meaning to data through the identification of different patterns.
To have an example of clustering, imagine a dataset composed of clients' feedback. In this scenario a cluster could be all feedback entries regarding delays, another cluster could be all feedback entries on how the support line was/wasn't helpful and so on.
Why clustering data can be beneficial?
Clustering groups items so that those in the same group/cluster have meaningful similarities. Thus, clustering is a great tool to unravel hidden patterns in the data.
Relevance AI's platform provides you with a no-code workflow to cluster your vectorized data with a few clicks. Make sure to follow the vectorize workflow guide if your dataset does not include vectors.
- Select the vector field based on which you wish to cluster the data
- Select the clustering algorithm (Kmeans or K-medoids), for Kmeans enter the number of clusters as well.
Note: Kmeans and K-mediods are both clustering methods. The advantage of Kmeans is that it will converge and produces clustering results. K-mediods is more complex, also more precise in many cases; however, it might not converge (i.e. not able to create clusters) on all dataset
- And Execute the workflow
The image below shows how to cluster a dataset based on the description field (i.e. the corresponding vectors) using the Kmeans algorithm. For Kmeans, the number of clusters must be specified in advance. Read our guid on How to select the number of clusters
After the workflow is finalized, clustering results are automatically added to your dataset under a new field (
_cluster_.description_mpnet_vector_.kmeans-10 for the example used in this guide). Check the results under the Dataset -> Monitor -> Clusters.
Note: Kmeans and K-medoids are only two of the many techniques proposed for clustering. Relevance AI also provides you with Auto-clustering and Hybrid-Clustering. In all cases, you can perform Sub-clustering which is to further analyse single clusters.
Updated about 2 months ago