Question answering (using USE QA Tensorflow Hub)

Get started with Relevance AI in 5 minutes!

This quickstart shows how easy it is to get started and how to quickly build question-answering applications using Relevance AI in just a few lines of code. Visit the documentation pages on use-cases for more in-depth tutorials and explanations for experimenting with stronger vector search.

Process of this quickstartProcess of this quickstart

Process of this quickstart

For each application, we demonstrate the ease of

  • encoding data,
  • indexing the data
  • vector search

to build powerful applications

Try it out in Colab: Open In ColabOpen In Colab

What I Need

  • Project and API Key: Grab your RelevanceAI project and API key by signing up
  • Python 3 (ideally a Jupyter Notebook/Colab environment)

Installation Requirements

Prior to starting, we need to install the main dependencies.

# RelevanceAi installation
pip install -U RelevanceAI[notebook]==0.27.0

# Vectorhub installation for quick access to Sentence Transformers
pip install vectorhub[encoders-text-tfhub]

Setting Up Client

To be able to use Relevance AI, you need to instantiate a client. This needs a Project and API key that can be accessed at https://cloud.relevance.ai/ in the settings area! Alternatively, you can run the code below and follow the link and the guide.

from relevanceai import Client 

"""
Running this cell will provide you with 
the link to sign up/login page where you can find your credentials.
Once you have signed up, click on the value under `Authorization token` 
in the API tab
and paste it in the appreared Auth token box below
"""

client = Client()

For this guide, we use our sample ecommerce dataset as shown below:

import pandas as pd
from relevanceai.datasets import get_ecommerce_dataset

# Retrieve our sample dataset. - This comes in the form of a list of documents.
documents = get_ecommerce_dataset()

pd.DataFrame.from_dict(documents).head()

Question Answering (Using TFHub's Universal Sentence Encoder QA)

Question answering can be a useful application of vector databases particularly for customer support and supporting search for FAQ documents. Here, we show an example of using TFHub's Question Answering Model.

Data prepration

First, we will define our encoder functions:

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import tensorflow_text

# Here we load the model and define how we encode
module = hub.load('https://tfhub.dev/google/universal-sentence-encoder-qa/3')

# First we define how we encode the queries
def encode_query(query: str):
    return module.signatures['question_encoder'](tf.constant([query]))['outputs'][0].numpy().tolist()

# We then want to define how we encode the answers
def encode_answer(answer: str):
    return module.signatures['response_encoder'](
        input=tf.constant([answer]), 
        context=tf.constant([answer]))['outputs'][0].numpy().tolist()

Next, we will encode the product_title field within our documents:

# Then loop through the encoding as such
for d in tqdm(documents):
    d['product_title_use_qa_vector_'] = encode_answer(d['product_title'])

Finally, we can upload the results to a dataset called quickstart_tfhub_qa in the Relevance AI platform:

# Then insert them into the database to get started
client.insert_documents("quickstart_tfhub_qa", documents)

Search

To be able to search within vectors that are generated by the Universal Sentence Encoder, we need to encode the query with the same vectorizer and then form a vector search as shown below:

query = "for my baby daughter"
query_vector = encode_query(query)

results = client.services.search.vector(
    "quickstart_tfhub_qa",
    multivector_query=[
        {
            "vector": query_vector,
            "fields": ["product_title_use_qa_vector_"]}
    ],
    page_size=5
)

Results can be seen on the Relevance AI dashboard accessible via the link provided after the search is done, or through Relevance AI json_shower:

from relevanceai import show_json
show_json(
    results, text_fields=["product_title"]
)
Sample of text search results for "for my baby daughter"Sample of text search results for "for my baby daughter"

Sample of text search results for "for my baby daughter"

Final Code

import pandas as pd
import uuid
import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import tensorflow_text

from relevanceai import Client
from relevanceai import show_json
from relevanceai.datasets import get_ecommerce_dataset


client = Client()

# data
# Retrieve our sample dataset. - This comes in the form of a list of documents.
documents = get_ecommerce_dataset()

pd.DataFrame.from_dict(documents).head()

# Here we load the model and define how we encode
module = hub.load('https://tfhub.dev/google/universal-sentence-encoder-qa/3')

# First we define how we encode the queries
def encode_query(query: str):
    return module.signatures['question_encoder'](tf.constant([query]))['outputs'][0].numpy().tolist()

# We then want to define how we encode the answers
def encode_answer(answer: str):
    return module.signatures['response_encoder'](
        input=tf.constant([answer]), 
        context=tf.constant([answer]))['outputs'][0].numpy().tolist()

# Then loop through the encoding as such
for d in tqdm(documents):
    d['product_title_use_qa_vector_'] = encode_answer(d['product_title'])

# adding the id field
for d in documents:
  d['_id'] = uuid.uuid4().__str__()    # Each document must have a field '_id'

# Insert the documents into a dataset called quickstart_tfhub_qa.
client.insert_documents("quickstart_tfhub_qa", documents)

# Then insert them into the database to get started
client.insert_documents("quickstart_tfhub_qa", documents)

query = "for my baby daughter"
query_vector = encode_query(query)
results = client.services.search.vector(
    "quickstart_tfhub_qa",
    multivector_query=[
        {
            "vector": query_vector,
            "fields": ["product_title_use_qa_vector_"]}
    ],
    page_size=5
)


show_json(
    results, text_fields=["product_title"]
)

Did this page help you?