IMPORTANT NOTICE: This repository has been archived. All future updates and releases will be made available in Mu-Sigma/HVT repository. Similarly, CRAN package muHVT has been discontinued and all the future releases will be made available on HVT package.
Zubin Dowlaty, Shubhra Prakash, Sangeet Moy Das, Shantanu Vaidya, Praditi Shah, Srinivasan Sudarsanam, Somya Shambhawi
The muHVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis, see Figure 1
as an example of a 2D torus map generated from the package. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:
-
Data Compression: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.
-
Data Projection: Dimension projection of the compressed cells to 1D,2D or 3D with the Sammons Non-linear Algorithm. This step creates topology preserving map (also called an embedding) coordinates into the desired output dimension.
-
Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map useful for semi-supervised tasks.
-
Prediction: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.
The muHVT package allows creation of visually stunning tessellations, showcasing the power of topology preserving maps. Below is an image depicting a captivating tessellation of a torus, see vignette for more details.
Figure 1: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable z.
07th June, 2023
In this version of muHVT package, the following new features have been introduced:
This package provides functionality to predict cells with layers based on a sequence of maps using predictLayerHVT
.
06th December, 2022
This package provides functionality to predict based on a sequence of maps.
The creation of a predictive set of maps involves three steps -
- Compress: Compress the dataset using a percentage compression rate and a quantization threshold using the HVT() function (Map A).
- Remove novelty cells: Manually identify and remove the novelty cells from the dataset using the removeNovelty() function (Map B).
- Compress the dataset without novelty: Again, compress the dataset without novelty using n_cells, depth and a quantization threshold using the HVT() function (Map C).
Let us try to understand the steps with the help of the diagram below -
Figure 2: Flow diagram for predicting based on a sequence of maps using predictLayerHVT()
Following are the links to the vignettes for the muHVT package:
muHVT Vignette: Contains descriptions of the functions used for vector quantization and construction of hierarchical voronoi tessellations for data analysis.
muHVT Model Diagnostics Vignette: Contains descriptions of functions used to perform model diagnostics and validation for muHVT model.
muHVT : Predicting Cells with Layers using predictLayerHVT : Contains descriptions of the functions used for predicting cells with layers based on a sequence of maps using predictLayerHVT.