Multistep Chunk Search

Fast fine-grained search - No word matching, just semantics

Multistep chunk search (fast fine-grained search - No word matching, just semantics)

Multi-step chunk search results for query "colorful cushions cover". As can be seen, top results all include "Floral" which is the result of having a vector search step first and a model knowing "colorful" and "floral" are conceptually close.Multi-step chunk search results for query "colorful cushions cover". As can be seen, top results all include "Floral" which is the result of having a vector search step first and a model knowing "colorful" and "floral" are conceptually close.

Multi-step chunk search results for query "colorful cushions cover". As can be seen, top results all include "Floral" which is the result of having a vector search step first and a model knowing "colorful" and "floral" are conceptually close.

Concept

Multistep chunk search also performs the search in the vector space. However, it relies on both normal and chunked data. This search first selects candidate results via vector search and then performs chunk search on the selected candidates to speed up the search process.

Sample use-case

Imagine dealing with many long pieces of text on different topics, as well as many paragraphs per page. If only one-fourth of the pages are about sport and a few about injuries, using multistep chunk search we can filter out the 75% non-related pages and limit the chunk search to the sub-data that we actually need.

Sample code

Sample codes using Relevance-AI SDK and Python requests for multistep chunk search endpoint are shown below.

from relevanceai import Client

project = <PROJECT-NAME>  # Project name
api_key = <API-KEY>       # api-key
dataset_id = <dataset_id>

client = Client(project, api_key)

query = "gift for my wife"  # query text

multistep_chunk_search = client.services.search.multistep_chunk(
        # dataset name
        dataset_id = dataset_id,

        first_step_multivector_query = [
            {"vector": client.services.encoders.text["vector"], 
             # vector fieldson which to run a first step vector search
             "fields": ["product_name_default_vector_"]}
        ],
      
        multivector_query = [
            {
                "vector": client.services.encoders.text["vector"],
                # list of vector fields to run the query against in the second step
                "fields": ["description_sntcs_chunk_.txt2vec_chunkvector_"],
            }
        ],
        
        # chunk field referring to the chunked data
        chunk_field = "description_sntcs_chunk_",

        # number of returned results
        page_size = 5,

        # minimum similarity score to return a match as a result
        min_score = 0.0,
)

This search is very useful when chunk-search becomes long due to large number of documents and chunks within them. It relies on machine learning techniques for vectorizing and similarity detection on chunked data. Therefore, both a chunker and a vectorizer are needed. It is possible to use multiple models for vectorizing and combine them all in search (i.e multivector_query in the request body).


Did this page help you?