Quantico: A CSS repository from AdrianAntico

Plotting Video

PlottingVideo.mp4

EDA Video

EDA.mp4

App Themes Video

AppThemes.mp4

Contributors

Installation

Note: if you're only looking to update Quantico, you only have to reinstall the Quantico package below in Step 4.

If you are setting up R for the first time run Steps 1-3

Step 1: install the "R-release" version of rtools and have it placed in your C:\ drive https://cran.r-project.org/bin/windows/Rtools/

Step 2: install R https://cran.r-project.org/bin/windows/base/

Step 3: install RStudio Desktop https://posit.co/download/rstudio-desktop/

Step 4: Install package dependencies:

Click to expand

options(install.packages.compile.from.source = "always")

# CRAN Packages
install.packages("devtools")
install.packages("data.table")
install.packages("collapse")
install.packages("bit64")
install.packages("doParallel")
install.packages("foreach")
install.packages("lubridate")
install.packages("timeDate")
install.packages("combinat")
install.packages("DBI")
install.packages("e1071")
install.packages("fBasics")
install.packages("itertools")
install.packages("MLmetrics")
install.packages("nortest")
install.packages("pROC")
install.packages("RColorBrewer")
install.packages("RPostgres")
install.packages("Rfast")
install.packages("stringr")
install.packages("xgboost")
install.packages("lightgbm")
install.packages("regmedint")
install.packages("RCurl")
install.packages("jsonlite")
install.packages("h2o")
install.packages("AzureStor")
install.packages("gitlink")
install.packages("arrow")
install.packages("reactable")
install.packages("DT")
install.packages("shiny")
install.packages("shinydashboard")
install.packages("shinyWidgets")
install.packages("shiny.fluent")
install.packages("shinyjs")
install.packages("shinyjqui")
install.packages("shinyAce")
install.packages("shinybusy")
install.packages("gyro")
install.packages("arrangements")
install.packages("echarts4r")
install.packages('tidytext')
install.packages('tibble')
install.packages('stopwords')
install.packages('SentimentAnalysis')
install.packages('quanteda')
install.packages('quanteda.textstats')
install.packages('datamods')
install.packages('phosphoricons')
install.packages('correlation')

# GitHub Packages
devtools::install_url('https://github.com/catboost/catboost/releases/download/v1.2/catboost-R-Windows-1.2.tgz', INSTALL_opts = c("--no-multiarch", "--no-test-load"))
devtools::install_github("AdrianAntico/prettydoc", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/AutoNLP", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/AutoPlots", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/Rodeo", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/AutoQuant", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/esquisse", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/Quantico", upgrade = FALSE, dependencies = FALSE, force = TRUE)

Background
Quick Start
App Interface
Documentation
Data Management
Code Generation
Visualization
Table Viewing
Exploratory Data Analysis
Data Wrangling
Feature Engineering
Unsupervised Learning
Inference
Inference Reporting
Machine Learning
Machine Learning Reporting
Forecasting
Forecasting Reporting

Background

Quantico is a Shiny App for data science, analytics, and business intelligence. The app is non-reactive where big data can cause a poor user experience. All data operations utilize data.table for fast processing and low memory utilization. Visualizations are based on the echarts4r library, machine learning algos currently include CatBoost, XGBoost, LightGBM, and some of the H2O models. Time series models are based on the forecast pacakge. Panel forecast models are ML-backed and can utilize CatBoost, XGBoost, or LightGBM. Data can be accessed via PostGRE or locally (currently), and session saving and restoration is available. There are 15 different colored app themes along with the inclusion of various background images if a user wants to zone out for a bit.

Goals

The fundamental goal of Quantico is to make life easier. While there are several GUI's available in the R ecosystem, I haven't found one that really serves my needs. I want to be able to explore data quickly and produce results that can be shared across an organization, as an example. Some of the tasks can take anywhere from an hour to a full day in a typical coding environment (possibly more or less dependent upon one's skills of course) while they can produced within minutes with Quantico. Another aspect is handling big data. The data.table package is utilized and can process big data quickly while keeping your memory footprint small, thus enabling larger datasets to be managed within the app for a given device. Lastly, I would like to be able to transition from an in-app experience to a coding environment with ease, which is handled nicely with the code generation part of the app. If I need to take something to the next level that the app doesn't support, I can grab the code and pick up where I left off in my favorite IDE.

User Experience

The primary goals of the app design is make it easy and fast to use, and to create a look and feel that is fun to use. The way the app layout works is that the sidebar is predominantly intended for setting up inputs and running various tasks (aside from the settings options) while the main panel is for displaying various outputs. With this design, I am able to maximize the space available for viewing output.

Note: For the best viewing experience I recommend using Chrome and having the zoom level set to 75%

App Capabilities

Tasks

Data Management
Session Saving & Restoration
Code Generation
Visualization
Data Viewer
Data Wrangling
Feature Engineering
Unsupervised Learning
Machine Learning
Statistical Inference
Forecasting

In-App Output:

Multi-Plot Visualization
Multi-Data Viewer
Exploratory Data Analysis
Statistical Inference
Machine Learning
Forecasting

Export Output:

Multi-Plot Visualization
Exploratory Data Analysis
Machine Learning
Forecasting

Quickstart

In your RStudio session, run the function Quantico::runQuantico() to kick off a Quantico session

Easy start

# Optionally, you can change up the WorkingDirectory argument for your desired file path location
# Note: For the best user experience I recommend using Chrome and having the zoom level set to 75%
Quantico::runQuantico(WorkingDirectory = getwd())

If you have a PostGRE installation you can add in the PostGRE parameters (or just pass them in while in session)

# Optionally, you can change up the WorkingDirectory argument for your desired file path location (don't forget to use these "/" instead of these "\" in your path)
# Note: For the best user experience I recommend using Chrome and having the zoom level set to 75%
Quantico::runQuantico(
  MaxTabs = 2L,
  WorkingDirectory = getwd(),
  PostGRE_DBNames = NULL, # list of database names you want connected
  PostGRE_Host = 'localhost',
  PostGRE_Port = 54321,
  PostGRE_User = '...',
  PostGRE_Password = '...')

Documentation

The documentation is located in the Home Tab in the Documentation tab. There is a side bar full of hyperlinks to speed up navigation. You simply click the topic of choice (and perhaps again if there are sub-categories) and the app will navigate to that location.

Data Management

On the side bar, under Load / Save, you have a few options:

Local
Sessions
PostGRE

Local

With the local modal you can load and save:

csv data
parquet data
machine learning models

Sessions

You can save your session state and reload this at a later time. You can have a pre-configured plot output setup that you don't want to have to recreate every time you run the app. This would be similar to having saved reports. Further, all output panels will re-populate with what was previously setup at the time of the last save.

PostGRE

Query data
Create tables
Create databases
Remove tables
Remove databases

Code Generation

The Code generation tab returns the code that was used to execute the various tasks and generate output. You can select from a variety of code themes as well. This can be really helpful to those who are looking to kickstart a project and then convert to a coding environment later. Some output can simply be generated much more quickly utilizing the app so this should be a time saver even to the most seasoned programmers.

Visualization

Plotting Basics

Plotting is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. One of the goals is to make plotting as easy as possible. You don't have to pre-aggrgate your data for plotting purposes since those steps will be carried out for you (although it can be). Just pass in your data and utilize the inputs to tell the software what you want.

Plot Types

Distribution	Aggregate	Time Series	Relationship	Model Evaluation
Histogram	Barplot	Line	Correlogram	Residuals
Density	Stacked Barplot	Area	Parallel	Residulas Scatter
Boxplot	3D Barplot	Step	Scatter	Partial Dependence Line
Word Cloud	Heatmap	River	3D Scatter	Partial Dependence Heatmap
Probability Plot	Radar	Autocorrelation	Copula	Calibration Line
	Piechart	Partial Autocorr	3D Copula	Calibration Boxplot
	Donut			Variable Importance
	Rosetype			Shapley Importance
				ROC Plot
				Confusion Matrix
				Gains
				Lift

Faceting

For the plots that enable faceting you only have to select the number of columns and rows and the app will take care of the rest. Note that, if your group variable contains more levels than the total allotted facet grid and you didn't subset the group levels to match that count, in the case that there are more levels than grid elements then the levels with the most records will be displayed first. Ties go to ABC order.

Aggregation Methods

Since the software will automatically aggregate your data (for some of the plot types) you can specify how you'd like your data aggregated. Below is a list of options:

count Counts of values by group. Here, you need to select any of the numeric YVars available in your data just so it doesn`t create an error for a missing YVar
proportion Proportion of total by group. Here, you need to select any of the numeric YVars available in your data just so it doesn`t create an error for a missing YVar
mean
meanabs (absolute values are taken first, then the measure)
median
medianabs (absolute values are taken first, then the measure)
sum
sumabs (absolute values are taken first, then the measure)
sd (standard deviation)
sdabs (absolute values are taken first, then the measure)
skewness
skewnessabs (absolute values are taken first, then the measure)
kurtosis
kurtosisabs (absolute values are taken first, then the measure)
CoeffVar (coefficient of variation)
CoeffVarabs (absolute values are taken first, then the measure)

Datetime Aggregation

If you have a numeric X-Variable you can choose to display your plot on a higher grain datetime. For example, if you have daily data and you are looking to build a barplot time series, you can switch the default date aggregate parameter from "as-is" to "month" to display monthly aggregated time series.

Variable Transformation Methods

For numeric variables you can choose to have them transformed automatically

Asinh: inverse hyperbolic sine
Log: natural logarithm
LogPlus1 (natural log(x + absolute value of minimum value if min value is negative))
Sqrt: square root
Asin: inverse sine
Logit
BoxCox
YeoJohnson

Plot Inputs

In the plotting panel you simply click on the top buttons (e.g. Plot 1, Plot 2, ...) and select a plot type from the dropdown menu. Then you click the button below to fill out the necessary parameters for your plot. Lastly, drop the newly created box in the dragula pane and move it to the bottom row in order for it to display.

When you click the button below the plot type dropdown, a modal will appear with up to five tabs for inputs and selections:

Data Selection Tab
Axis Variables Tab
Grouping Variables Tab (in most cases but not all)
Filter Variables Tab
Formatting Tab

Data Selection

The Data Selection tab is where you'll choose your dataset and number of records to display. The display record count is the number of records used for display purposes. For plots that require data aggregation display records won't typically matter but for non-aggregated data plots the records displayed are randomly sampled from your data right before the plot build occurs; not before any data preparation steps.

Axis Variables

Axis variables: The Axis Variables tab is where you'll define your axis variables and any transformations you'd like applied. The modals are designed to only supply inputs that are actually used for the given plot type. For example, histogram plots only required variables to be defined across a single dimension (you can select more than one variable however), whereas with line plots, you'll need to defined an X-Axis variable (a date variable) and Y-Axis variables.

Transformations: Automatic transformations can be selected and generated for numeric variables during the data preparation process while the software builds the plots.

Group Variables

The Group Variables tab is where you'll optionally define up to 3 group variables and faceting selection (if applicable). Since multiple group variables are allowed for the plotting engine the group variables will be concatenated and the combined levels will be displayed. For each group variable you can select the levels you wish to have displayed. For faceting, you simply select the number of rows and columns desired to form the grid of your choice.

Filter Variables

The Filter Variables tab is where you can optionally define filters for your data before having the plot displayed. You can select up to 4 filter variables, you'll define the logical operation you want conducted, and associated values based on the logical operation you selected.

Formatting

The Formatting tab is where you can rename the plot title and axis titles. You can also select to have data values shown on the plots.

Plotting Report Export

You can save you plotting setup to an html file. Just click the Save button after you've setup your plots. While you can setup a grid of output in the app the plots will be stacked on top of each other in the html file due to limited space. The only time this doesn't occur is for faceted plots, which are themselves a grid within a grid.

Tables Viewer

The Tables Viewer output tab allows you to views multiple tables stacked on top of each other. You can alter the number of records displayed, total records brought into the table, randomly sampled or not, and a few other formatting options. This can be useful for inspecting data after running some of the various tasks when you want to view new data or altered data.

Exploratory Data Analysis

The Exploratory Data Analysis Report can display a variety of data insights, by a group variable if desired, including:

Data dictionary information
Univariate statistics
Univariate box plots
Univariate bar plots
Correlogram
Trend line plots

EDA Collapsed Output View

EDA Expanded Output View

EDA Report Export

The EDA Report can be generated by clicking the Save button on the EDA Output Panel either before or after generating the EDA info in app.

Data Wrangling

Data Wrangling Basics

Data wrangling is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.

Data Wrangling Methods:

Category	Method
Shrink	Aggregate
	Subset Rows
	Subset Columns
	Sampling
Grow	Join
	Union
Dataset	Partition Data
	Sort Data
	Remove Data
	Model Data Prep
Pivot	Melt Data
	Cast Data
Columns	Type Casting
	Time Trend
	Rename Columns
	Concatenate Columns
Misc	Meta Programming
	Time Series Fill
	Time Series Roll Fill

Feature Engineering

Feature Engineering Basics

Feature Engineering is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.

Feature Engineering Methods:

Category	Method
Numeric	Percent Rank
	Standardize
	Transformations
	Interaction
Categorical	Character Encoding
	Partial Dummies
Calendar	Calendar Variables
	Holiday Variables
Windowing	Rolling Numeric
	Differencing
	Rolling Categorical

Unsupervised Learning

Unsupervised Learning Basics

Unsupervised Learning is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.

Unsupervised Learning Methods:

Category	Method
Text	Word2Vec
	Text Summary
	Sentiment
	Readability
	Lexical Diversity
Other	Clustering
	Anomaly Detection
	Dimensionality Reduction

Inference

Inference Basics

Inference is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.

Inference Methods:

Normality Testing
Correlation Testing
One-Sample T-Test
Two-Sample T-Test
F-Test
Chi-Square Test

Inference Reporting

The Inference Reports are dependent upon the inference method chosen. They all return summary statistics and visuals to help assess effects and assumptions.

Normality Report

Correlation Report

One Sample T-Test Report

Two Sample T-Test Report

F-Test Report

Chi-Square-Test Report

Machine Learning

ML is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. The documentation in-app contains information on each of the ML Algo types.

Currently available algorithms include:

CatBoost
XGBoost
LightGBM
H2O-DRF
H2O-GBM
H2O-GLM
H2O-HGLM
Causal Mediation

Some of the built-in features include:

Automatic Transformations and backtransformations if user requests
Data partitioning for train, validation, and test data sets if the user only supplies a training data set
Categorical variable encoding and backtransform if the user supplies categorical variables as features
Computation of model metrics for evaluation
Data conversion to the structure appropriate for the given algorithm selected
Multi-arm bandit grid tuning

Machine Learning Reports

The ML Evaluation Report can be generated by clicking the Save button on the ML Output Panel either before or after generating the ML info in app.

Forecasting

Forecasting is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. The documentation in-app contains information on each of the Forecasting Algo types.

Currently available algorithms can be split into Single Series and Panel Series:

Single Series Forecasting

TBATS
SARIMA
ETS
ARFIMA
NNET

Single Series Run Modes

Grid Tuning
Forecasting

Panel Series Forecasting

CatBoost
XGBoost
LightGBM

Panel Forecasting Run Modes

There are various Run modes to train, backtest, and forecast:

Training Options

Train Model: This is equivalent to building an ML model
Retrain Existing Model: This is for retraining a model that's already been built before. Perhaps you simply want an updated model but not a new forecast at the moment

Backtesting Options

Backtest: This task will train a new model (if FC ArgsList is not supplied) and generate an N-Period ahead forecast that will be evaluated using Validation Data supplied by the user. If you don't have a Validation dataset, go to Data Wrangling and subset rows based on a time variable. The subset data will be your Training Data and your original dataset will be the Validation Data
Backtest Cross Evaluation: Once you have a good model designed you can mock production by running this procedure. Here, you'll set the data refresh rate and the model update rate. Performance measure are returned in a data.table once the procedure is finished.
Feature Engineering Test: This task will loop through various builds starting from the most simple up to a moderately sophisticated model. An evaluation table is generated that you can view in the Tables tab when the procedure is complete. Evaluation metrics are based on the Backtest method. Features tested are below and are in order. If a feature is beneficial it will remain in the models trained thereafter:

LogPlus1 vs None: this will test whether a target variable transformation is beneficial

Series Difference vs None: this will test whether utilizing Differencing your series is useful

Calendar Variables vs None: this will test whether utilizing Calendar Variables is useful

Holiday Variable vs None: this will test whether utilizing Holiday Variables is useful

Credibility vs Target Encoding : this will test whether a target encoding is better than a credibility encoding

Time Weights vs None: this will test whether utilizing Time Weighting is useful

Anomaly Detection vs None: this will test whether utilizing Anomaly Detection is useful

Time Trend Variable vs None: this will test whether utilizing a Time Trend Variable is useful

Lag 1 vs None: this will test whether utilizing Lags are useful

Forecasting Options

Forecast: if you have a trained model you can call it to generate a forecast for you
Retrain + Forecast: if you have a model you can refresh it and have it generate a forecast for you

Forecast Reports

The FC Evaluation Report can be generated by clicking the Save button on the FC Output Panel either before or after generating the FC info in app.

AdrianAntico/Quantico