Pinned Repositories
correlation-approximation
Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets
Datawake
Browser add-on and web server to support collection and analysis of web browsing data.
distributed-graph-analytics
Distributed Graph Analytics (DGA) is a compendium of graph analytics written for Bulk-Synchronous-Parallel (BSP) processing frameworks such as Giraph and GraphX. The analytics included are High Betweenness Set Extraction, Weakly Connected Components, Page Rank, Leaf Compression, and Louvain Modularity.
distributed-louvain-modularity
Community Detection and Compression Analytic for Big Graph Data
graphene
mitie-trainer
Model Training tool for MITIE
newman
Quickly analyze and explore email with advanced analytics and visualization.
pst-extraction
PST extraction and analytic pipeline
spark-distributed-louvain-modularity
Spark / graphX implementation of the distributed louvain modularity algorithm
zephyr
Zephyr is a big data, platform agnostic ETL API, with Hadoop MapReduce, Storm, and other big data bindings.
Sotera Defense, now Jacobs's Repositories
Sotera/mitie-trainer
Model Training tool for MITIE
Sotera/distributed-louvain-modularity
Community Detection and Compression Analytic for Big Graph Data
Sotera/Datawake-Legacy
This project is superseded by the current Datawake project but is maintained here for existing users. Browser extension and backend services aimed at enhancing Internet search with domain specific knowledge, collaboration, and analysis.
Sotera/high-betweenness-set-extraction
Approximate Betweenness Centrality computation for big graph data.
Sotera/page-rank
Sotera/xdata-vm
Vagrant-Ubuntu VM serving as a platform for XDATA performer software integration
Sotera/leaf-compression
Sotera/xdata-nba
Tools to mine nba data
Sotera/graphene-enron
Sotera/graphene-walker
Sotera/hive-common-udf
A collection of common Apache Hive UDFs
Sotera/Bitcoin_Updater
Stores and updates the bitcoin blockchain and historical bitcoin market data into a mysql database.
Sotera/darpa_open_catalog
Meta information for the DARPA open catalog project.
Sotera/twitter-cacher
Twitter Scraper
Sotera/xdata_meta
Meta information about the XData project
Sotera/zephyr-contrib
Useful classes for functions outside the scope of Zephyr's ETL, but still used in many scenarios (generally with extensive dependencies that probably shouldn't be in the core API).
Sotera/zephyr-sample-project
A sample project (or, rather, sample projects) to show various ways of using Zephyr - generally a good starting point for your own Zephyr implementations.
Sotera/Dec2014Demos
Demonstration material for Dec 2014
Sotera/thunderdome-java
Java Client for xlang/thunderdome