After uploading a dataset to your account in Relevance AI, you can preview your data in the dashboard https://cloud.relevance.ai under Dataset. Note that if you just started, your account will be empty!
By storing data on our platform, you can directly invoke vectorization models, perform an optimized nearest-neighbor search, cluster data, etc.
Some basic actions to deal with datasets are:
- Create a dataset
- List the available datasets in a project/account
- Monitor a specific dataset
- Delete a dataset
You can either use the dashboard to take these actions or employ Relevance AI Python SDK. For the Python SDK, you need to install Relevance AI and initiate a client as shown in the two code snippets below:
pip install -U RelevanceAI
from relevanceai import Client """ Running this cell will provide you with the link to sign up/login page where you can find your credentials. Once you have signed up, click on the value under `Authorization token` in the API tab and paste it in the Auth token box that appears below """ client = Client()
To create a new empty dataset pass the name under which you wish to save the dataset to the
create function as shown below. In this example, we have used
ecommerce-sample-dataset as the name.
See Inserting and updating documents for more details on how to insert/upload documents into a dataset.
- You cannot rename datasets or rename/edit existing field names. However, you can copy datasets and edit field names using the
- Id field: Relevance AI platform identifies unique data entries within a dataset using a field called
_id(i.e. every document in the dataset must include an
_idfield with a unique value per document).
- Vector fields: the name of vector fields must end in
You can see a list of all datasets you have uploaded to your account in the dashboard.
Alternatively, you can use the list endpoint under Python SDK as shown below:
RelevanceAI's dashboard at https://cloud.relevance.ai is the most straightforward place to monitor your data.
Alternatively, you can monitor the health of a dataset using the command below which returns the count of total missing and existing fields in the data points in the named dataset.
client.datasets.monitor.health("ecommerce-sample-dataset") # Returns a count of total missing and existing data
Deleting an existing dataset can be done on the dashboard using the Delete All Data button. Or through the following code. In this case, we are deleting the
client.datasets.delete(dataset_id = "ecommerce-sample-dataset")
Updated 4 days ago