Dataset Basics

Dedicated section to managing your RelevanceAI space

To be able to conduct vector experiments using Relevance AI, you need to sign up at https://cloud.relevance.ai/. Alternatively, you can follow the guide on our installation page.

Datasets

After uploading a dataset to your account in Relevance AI, you can preview your data in the dashboard https://cloud.relevance.ai under Dataset. Note that if you just started, your account will be empty!

RelevanceAI DatasetsRelevanceAI Datasets

RelevanceAI Datasets

Interacting With Datasets

By storing data on our platform, you can directly invoke vectorization models, perform an optimized nearest-neighbor search, cluster data, etc.

Some basic actions to deal with datasets are:

  • Create a dataset
  • List the available datasets in a project/account
  • Monitor a specific dataset
  • Delete a dataset

You can either use the dashboard to take these actions or employ Relevance AI Python SDK. For the Python SDK, you need to install Relevance AI and initiate a client as shown in the two code snippets below:

pip install -U RelevanceAI
from relevanceai import Client 

"""
Running this cell will provide you with 
the link to sign up/login page where you can find your credentials.
Once you have signed up, click on the value under `Authorization token` 
in the API tab
and paste it in the Auth token box that appears below
"""

client = Client()

Creating a dataset

To create a new empty dataset pass the name under which you wish to save the dataset to the create function as shown below. In this example, we have used ecommerce-sample-dataset as the name.

client.datasets.create(dataset_id="ecommerce-sample-dataset")

See Inserting and updating documents for more details on how to insert/upload documents into a dataset.

❗️

Remember!

  • You cannot rename datasets or rename/edit existing field names. However, you can copy datasets and edit field names using the clone feature.
  • Id field: Relevance AI platform identifies unique data entries within a dataset using a field called _id (i.e. every document in the dataset must include an _id field with a unique value per document).
  • Vector fields: the name of vector fields must end in _vector_

List your datasets

You can see a list of all datasets you have uploaded to your account in the dashboard.

List of datasets in the dashboardList of datasets in the dashboard

List of datasets in the dashboard

Alternatively, you can use the list endpoint under Python SDK as shown below:

client.datasets.list()

Monitoring a specific dataset

RelevanceAI's dashboard at https://cloud.relevance.ai is the most straightforward place to monitor your data.

Monitor your vector healthMonitor your vector health

Monitor your vector health

Alternatively, you can monitor the health of a dataset using the command below which returns the count of total missing and existing fields in the data points in the named dataset.

client.datasets.monitor.health("ecommerce-sample-dataset")

# Returns a count of total missing and existing data
Monitoring dataset healthMonitoring dataset health

Monitoring dataset health

Deleting a dataset

Deleting an existing dataset can be done on the dashboard using the Delete All Data button. Or through the following code. In this case, we are deleting the ecommerce-dataset:

Dash board viewDash board view

Dash board view

client.datasets.delete(dataset_id = "ecommerce-sample-dataset")

Did this page help you?