Multi-vector search with your own vectors

Get started with Relevance AI in 5 minutes!

Vector SpacesVector Spaces

Try it out in Colab: Open In ColabOpen In Colab

What I Need

  • Project and API Key: Grab your Relevance AI project and API key by signing up
  • Python 3 (ideally a Jupyter Notebook/Colab environment)

Installation Requirements

Prior to starting, let's install the main dependencies.

!pip install -U RelevanceAI[notebook]

This will give you access to Relevance AI's Python SDK.

Setting Up Client

After installation, we need to also set up an API client. If you are missing an API key, you can easily sign up and get your API key from https://cloud.relevance.ai/ in the settings area.

from relevanceai import Client 

"""
You can sign up/login and find your credentials here: https://cloud.relevance.ai/sdk/api
Once you have signed up, click on the value under `Authorization token` and paste it here
"""
client = Client()

Steps to perform multi-vector search

  1. Get sample data
  2. Vectorize the data
  3. Insert into your dataset
  4. Search your dataset
Steps to searchSteps to search
Steps to search

1. Data + Encode

Here, we get a dataset that has been already encoded into vectors; so we will be skipping the encoding step in this page, but feel free to visit other pages in our guides, such as Text-to-image search (using OpenAI's CLIP Pytorch), to learn about encoding with a variety of Pytorch/Tensorflow models!)

from relevanceai.datasets import get_dummy_ecommerce_dataset

documents = get_dummy_ecommerce_dataset()
pd.DataFrame.from_dict(documents).head()
E-commerce Dataset PreviewE-commerce Dataset Preview
E-commerce Dataset Preview

2. Insert

To insert data to a dataset, you can use the insert_documents method. Note that this step is also already done in our sample dataset.

# Now we instantiate our client
client.insert_documents(dataset_id="quickstart_sample", docs=documents)

After finalizing the insert task, the client returns a link guiding you to a dashboard to check your schema and vector health!

Relevance AI DashboardRelevance AI Dashboard
Relevance AI Dashboard

3. Search

Since this will be using your own vectors, we will skip vectorizing the query and just retrieve a vector from an existing document in the dataset.

doc = client.datasets.documents.get(dataset_id="quickstart-example", id=docs[0]['_id'])
image_vector = doc['document']['product_image_clip_vector_']
text_vector = doc['document']['product_title_use_vector_']

Now, let us try out a query using a simple vector search against our dataset.

# Create a multivector_query parameter - which is a list of Python dictionaries with 2 keys  "vector" and "fields"
multivector_query = [
    {"vector": image_vector, "fields": ['product_image_clip_vector_']},
    {"vector": text_vector, "fields": ['product_title_use_vector_']}
]

#Perform a vector search
results = client.services.search.vector(
    dataset_id="quickstart-example", 
    multivector_query=multivector_query,
    page_size=5
)

Here our query is just a simple multi vector query, but our search comes with out of the box support for features such as multi-vector, filters, facets and traditional keyword matching to combine with your vector search. You can read more about how to construct a multivector query with those features here.

Now lets show the results with show_json.

from relevanceai import show_json

print('=== QUERY === ')
display(show_json([doc['document']], image_fields=["product_image"], text_fields=["product_title"]))

print('=== RESULTS ===')
show_json(results, image_fields=["product_image"], text_fields=["product_title"])
Multi-vector Search ResultsMulti-vector Search Results
Multi-vector Search Results

Try it out in Colab: Open In ColabOpen In Colab

Final Code

from relevanceai import Client

client = Client()

# Retrieve our sample dataset. - This comes in the form of a list of documents.
documents = get_sample_ecommerce_dataset()
pd.DataFrame.from_dict(documents).head()

client.datasets.delete("quickstart_sample")
client.insert_documents("quickstart_sample", documents)

# Let us get a document and its vector 
doc = client.datasets.documents.get(dataset_id="quickstart_sample", id="711161256")
vector = doc['document']['product_image_clip_vector_']

# Create a vector query - which is a list of Python dictionaries with the fields "vector" and "fields"
multivector_query = [
    {"vector": vector, "fields": ['product_image_clip_vector_']}
]

results = client.services.search.vector(
    dataset_id="quickstart_sample", 
    multivector_query=multivector_query,
    page_size=5
)

from relevanceai import show_json
print('=== QUERY === ')
display(show_json([doc['document']], image_fields=["product_image"], text_fields=["product_title"]))

print('=== RESULTS ===')
show_json(results, image_fields=["product_image"], text_fields=["product_title"])

Did this page help you?