How to update an existing dataset

In various situations, one might need to make changes to an existing dataset. Common modifications are:

  • Adding new entries (i.e. rows) to an existing dataset. For instance, when a survey is updated periodically (e.g. quarterly tracking projects) and new responses are to be added to an existing dataset
  • Adding new fields (i.e. columns) to an existing dataset. For instance, when you want to add extra information per entry (e.g. date, gender, address, new responses), adding these extra columns to your dataset for analysis
  • Modifying existing values in a dataset. For instance, when there has been a mistake in some values or values under a field have changed (e.g. updates to an address field).

Adding new items (i.e. rows) to an existing dataset

This is a very simple task including the following steps:

  1. prepare your CSV file the same way that you did initially (i.e for the existing dataset)
    Note: For the new data to sit properly under the headers in the existing dataset, the headers between your first CSV file and the second CSV file must be exactly the same.
  2. On the platform, select the existing dataset to which you wish to add new data
  3. Click on "View dataset" at the top of the page
Relevance AI - How to view my data and access to UploadRelevance AI - How to view my data and access to Upload

Relevance AI - How to view my data and access to Upload

🚧

Headers must be typed EXACTLY the same between data batches for the new data to sit properly in an existing dataset

When updating an existing dataset with a new batch of data, headers must be exactly the same between the old and the new CSV. Otherwise new columns will be added to your dataset.

Keep in mind that the platform is case-sensitive. For example "Name" and "name" are considered as two different headers.

  1. Click on "Upload". Drag and drop the new CSV file and your new entries will be added to the existing dataset
Relevance AI - Upload to an existing datasetRelevance AI - Upload to an existing dataset

Relevance AI - Upload to an existing dataset


Modifications

There is a unique identifier per entry (_id) in datasets on the Relevance AI's platform. The _id field can preexist in a CSV (i.e. included in the to-be-uploaded CSV file). Otherwise, the platform automatically adds the field with unique values.

This id field is your access point to modify exiting entries in a dataset.

📘

The _id field is your access point to an individual entry in a dataset.

Either include an _id field with unique values per entry in your CSV file when uploading a dataset, or use the export functionality to access the assigned ids.

Adding new fields (columns) to an existing dataset

  1. Prepare a CSV file that includes an _id header/column and the new field(s) you wish to add to your dataset. The values under _id must be equal to the id values associated to the existing entries that you wish to update.
    In the example below, an existing dataset with 3 entries is shown. Each entry has an id and two fields (Col1 and Col2). We wish to add two new fields (Col3, Col4) to the dataset. This can be easily done by uploading a CSV similar to what is shown under "Data to update".
Existing Dataset            Data to update      

_id | Col1  | Col2        _id | Col3  | Col4
------------------       --------------------
  1 |  V1   |  V4          1  |  V7   |  V9
------------------       --------------------
  2 |  V2   |  V5          2  |       |  V10
------------------       --------------------
  3 |  V3   |  V6          3  |  V8   |  
    
 
         Resulting Dataset
         
_id | Col1  | Col2 | Col3  | Col4
--------------------------------------
  1 |  V1   |  V4  |  V7   |  V9
--------------------------------------
  2 |  V2   |  V5  |       |  V10
--------------------------------------
  3 |  V3   |  V6  |  V8   |  
  1. On the platform, select the existing dataset to which you wish to add new data
  2. Click on "View dataset" at the top of the page

  1. Click on "Upload". Drag and drop the new CSV file and your new columns will be added to the existing dataset.

Modifying existing values in a dataset

  1. Prepare a CSV file that includes an _id header/column and the field(s) you wish to modify in your dataset. The values under _id must be equal to the id values associated to the existing entries that you wish to update.
    In the example below, an existing dataset with 3 entries is shown. Each entry has an id and two fields (Col1 and Col2). We wish to modify Col1 in the second entry and Col2 in the third entry.
Existing Dataset            Data to update      

_id | Col1  | Col2        _id | Col1  | Col2
------------------       --------------------
  1 |  V1   |  V4          2  |  V7   |  
------------------       --------------------
  2 |  V2   |  V5          3  |       |  V10
------------------      
  3 |  V3   |  V6        
    
 
Resulting Dataset
         
_id | Col1  | col2 
-------------------
  1 |  V1   |  V4  
-------------------
  2 |  V7   |  V5  
-------------------
  3 |  V3   |  V10  
  1. On the platform, select the existing dataset to which you wish to add new data
  2. Click on "View dataset" at the top of the page

  1. Click on "Upload". Drag and drop the new CSV file and the specified values will be updated.

Note 1: You do not have to update all entries. Only, include the _id values for entries that you wish to update (e.g. first document in the above example)

Note 2: If a cell is empty in your new CSV file, no modification is applied to its associated entry in the dataset (e.g. Col1 for the third entry in the above example)