In this page we are exploring options that are available for both data view and cluster view. Keep in mind that to benefit from vector analysis, we recommend vectorizing and clustering your data which is the requirement for the cluster explorer option.
On the Explorer app, data/cluster explorer is located after the Metrics and Aggregation view.
Cluster explorer is activated if "Explore Clusters" is ticked. Otherwise, the data explorer mode is on. You can switch between the two modes anytime.
This mode is to view your data as it is in the dataset but maybe only specific fields of each entry rather than looking at a huge table.
The image below illustrates three data entries. Two fields (description and repo name) are selected to be viewed. However, only one of the entries contains a value for the repo name (i.e. fields with null values will not be shown on the data explorer).
To select fields, remove them or reorder them in the data view, click on the "Show or hide fields" button (marked with A in the image below). A small window will pop open; clicking on the "Filed to preview" bar will activate a drop-down menu from which you can select the desired fields. Remove the selected ones using the 'x' mark next to each field. You can reorder the fields via drag-and-drop in the bottom list.
After vectorizing and clustering your data, you can select the Cluster Explorer mode and choose the clustering results based on which you wish the data to be viewed.
Tick "Explore clusters, click on "Select cluster", select your desired clustering result from the drop-down menu under "Cluster fields" (marked with 3 in the below image) and finally apply changes.
Note: You can also select subclustering results to be presented alongside the clusters. Click on "Select subcluster" (only shown if subclustering results are present in the dataset) and the rest is the same as what was explained for selecting clusters.
Scroll down and you will see that clusters and their top few entries are listed after each other. The ordering is based on the selected sort and metric as was explained on Search, sort and filters.
The following image illustrates one cluster card with a brief explanation of the available parameters. The top bar in each cluster card presents the defined metrics ] calculated based on the values existing within each cluster. The bottom part is a few of the top entries in the cluster and the values per selected fields.
Note: Items shown on each cluster card are selected by AI algorithms and are the best representatives of each cluster; something like centring points of a group.
What if we want to group the data within each cluster based on some specific fields. For instance, in a corporation dataset, we might want to see the distribution of profit per department, and distribution of complaints or wait-time per month.
In our sample dataset, we are going to set two types of aggregate visualisation. 1) On the existing programming language field which can help identify what languages are the most common ones in each cluster/category (e.g. what languages for large data analysis, what languages for machine learning). 2) A time series on the star rating which can indicate the success and possibly the failure period for each category.
To set up aggregation within clusters, click on "cluster aggregations" which is placed on the right hand side, after the overall Metrics and right before the first cluster card.
When clicked on for the first time a window similar to what is shown in the following image will pop open. Click on "Add visualization".
Set up the parameters as guided in the following image.
- Name your aggregation
- Select the desired field
- Select the desired number of items to show (1 to 5) in the small view
- Select the visualization type
- Click on apply changes
Our two aggregations and their corresponding views, which will be located on the right side of each cluster card are presented below.
Bar chart view sample
We can see that in the first aggregation the top five programming languages are Scala, Python, Jupyter Notebook, Java and HTML. This is useful in many scenarios. For instance, if the cluster is on data analysis and a company is hiring someone in that field, it is likely that someone with a good understanding of these languages is a better fit for the needs.
Time series view sample
In the second aggregation, we can see the setup corresponding to time series for showing the average number of stars in each cluster. The chart illustrates a rapid rise between Nov 2015 to Nov 2016 which indicates success for that category. However, there is a drastic downfall from Nov 2016 afterwards.
After setting up cluster aggregations, they will be shown for each cluster and on the right side of each cluster card.
We answer these two questions together. If you click on "view full document" button (marked with B in the images above), a new window will pop open and you will be presented with the full list of fields and their values on the left-hand side. If your dataset has been vectorized, you will be presented with a list of the most similar items in the dataset, on the right-hand side; this is calculated via AI and vector analysis.
Learn about cluster comparison in the next page.
Updated 7 days ago