What is clustering?

Clustering is the task of breaking data into a number of groups (called clusters) in a way that items in the same groups are more similar to one another than to those in other groups. For instance, in a news dataset, articles on politics can form one cluster and articles on sports form a second cluster. Or in a customers' feedback dataset one cluster can be the reviews on quality, the other reviews on response time.

13521352

unclustered vs clustered data

📘

What is clustering

Clustering groups items so that those in the same group/cluster have meaningful similarities (i.e. specific features or similarities). Clustering facilitates informed decision-making by giving significant meaning to data through the identification of different patterns.

Use cases

Clustering has a variety of uses in many industries. Some common applications for clustering are:

  • Customer feedback analysis
  • Social network analysis
  • Market segmentation
  • Search result grouping
  • Anomaly detection

Common clustering methods

  • Connectivity models: The idea behind these models is that if two data points are close to one another they are similar to each other as well. These models are easy to build and interpret but lack scalability when facing big datasets. Hierarchical clustering is the most common in this category.
  • Centroid models: Iterative clustering algorithms such as K-Means are in this category. Here, the similarity is defined based on the distance of a data point from the centroid of a cluster. Prior knowledge of the dataset is required when using centroid models since the number of clusters must be decided in advance.
  • Density Models: These models scan the data space to identify regions with different densities of data points. Each region is one cluster. Popular density models are DBSCAN and OPTICS.

Clustering at Relevance AI

We provide a no-code platform for vectorising (i.e. a prerequisite of clustering) and clustering data. No technical knowledge is required. Simply follow the corresponding workflows and the results are automatically added to your dataset. Then using different tools that we have designed for better insight extraction, you can analyse your data. Below is an image showing the trends in a customer feedback analysis.

34523452

Relevance AI - Clusters