Diversity Search

No word matching, just semantics with clustering

Diversity in search (no word matching, just semantics with clustering)

Diversity search results for query "birthday gift". As can be seen, results include a variety of items.Diversity search results for query "birthday gift". As can be seen, results include a variety of items.

Diversity search results for query "birthday gift". As can be seen, results include a variety of items.

Concept

Diversity is similar to vector search and performs the search in the vector space. It provides you with the ability to search for context as opposed to exact word matching. Diversity search, however, employs clustering as a step after search to add variety to the results: search results are clustered and top items of each cluster are shown.

Sample use-case

Looking for "sneakers" under this search in a sportswear dataset will result in an answer list but not just a list of 1 particular shoe like Nike but a combination of Nike, Adidas, Asics, etc. This is important because in vector spaces - it may automatically tie a specific brand to one item in particular. As a result - we may want to get a larger diversity of the most relevant results in a streamlined manner.

Sample code

Sample codes using RelevanceAI SDK for diversity search endpoint are shown below.

from relevanceai import Client

dataset_id = "ecommerce-search-example"

client = Client()

query = "birthday gift"  # query text
query_vec = client.services.encoders.text(text=query)

url = "https://gateway-api-aueast.relevance.ai/v1/"
diversity_search = client.services.search.diversity(
    # dataset name
    "dataset_id": dataset_id,

    # list of vector fields to run the query against
    "multivector_query": [
      {"vector": client.services.encoders.multi_text(text=query)['vector'], 
       "fields": ["descriptiontextmulti_vector_", "product_nametextmulti_vector_"]},

      {"vector": client.services.encoders.textimage(text=query)['vector'],
       "fields": ["description_imagetext_vector_"]}
    ],

    # vector field on which the clustering is done
    "cluster_vector_field": "product_nametextmulti_vector_",
  
    # number of clusters
    n_clusters=5,
  
    # number of returned results
    page_size=20,
  
    # minimum similarity score to return a match as a result
    min_score=0.2,
)

This search is slightly longer than vector search due to the added clustering step. It relies on machine learning techniques for vectorizing, similarity detection and clustering. Therefore, at least one vectorizer is needed. It is possible to use multiple models for vectorizing and combine them all in search (i.e multivector_query in the request body). Note the increased page_size parameter, so that there is enough data for clustering.


Did this page help you?