Future Data Systems

We are a CS research group building data-intensive systems

Stanford, CA

Pinned Repositories

ARES
Language:Python339 10 1538
ASAP
ASAP: Prioritizing Attention via Time Series Smoothing
Language:Jupyter Notebook189 21 631
ColBERT
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Language:Python2.6k 42 251346
dawn-bench-entries
DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Language:Python256 34 1974
FAST
End-to-end earthquake detection pipeline via efficient time series similarity search
Language:Jupyter Notebook145 30 2056
FrugalGPT
FrugalGPT: better quality and lower cost for LLM applications
Language:Python141 12 313
index-baselines
Simple baselines for "Learned Indexes"
Language:HTML156 17 126
macrobase
MacroBase: A Search Engine for Fast Data
Language:Java658 55 77128
noscope
Accelerating network inference over video
Language:Python434 45 51121
sparser
Sparser: Raw Filtering for Faster Analytics over Raw Data
Language:C429 40 554

Future Data Systems's Repositories

stanford-futuredata/ColBERT
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Language:Python2.6k 42 251346
stanford-futuredata/macrobase
MacroBase: A Search Engine for Fast Data
Language:Java658 55 77128
stanford-futuredata/ARES
Language:Python339 10 1538
stanford-futuredata/FAST
End-to-end earthquake detection pipeline via efficient time series similarity search
Language:Jupyter Notebook145 30 2056
stanford-futuredata/FrugalGPT
FrugalGPT: better quality and lower cost for LLM applications
Language:Python141 12 313
stanford-futuredata/gavel
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
Language:Jupyter Notebook121 11 2431
stanford-futuredata/stk
Language:Python69 3 315
stanford-futuredata/sinkhorn-label-allocation
Sinkhorn Label Allocation is a label assignment method for semi-supervised self-training algorithms. The SLA algorithm is described in full in this ICML 2021 paper: https://arxiv.org/abs/2102.08622.
Language:Python53 8 13
stanford-futuredata/Willump
Willump Is a Low-Latency Useful Machine learning Platform.
Language:Python43 12 28
stanford-futuredata/Baleen
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)
Language:Python40 13 55
stanford-futuredata/Uniserve
A runtime implementation of data-parallel actors.
Language:Java37 9 26
stanford-futuredata/Megatron-LM
Ongoing research training transformer models at scale
Language:Python31 2 08
stanford-futuredata/blazeit
Its BlazeIt because it's blazing fast
Language:C++30 13 610
stanford-futuredata/POP
Code for "Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP", which appeared at SOSP 2021
Language:Python24 7 14
stanford-futuredata/omg
Language:Python20 9 03
stanford-futuredata/loa
Public code for LOA
Language:Python18 7 22
stanford-futuredata/tasti
Semantic Indexes for Machine Learning-based Queries over Unstructured Data (SIGMOD 2022)
Language:Python13 8 35
stanford-futuredata/cs245-as1
Student files for CS245 Programming Assignment 1: In-memory data layout
Language:Java12 9 050
stanford-futuredata/cs245-as2-public
Language:Scala8 9 028
stanford-futuredata/InQuest
Accelerating Aggregation Queries on Unstructured Streams of Data
Language:Python7 8 11
stanford-futuredata/SparseJointShift
Model Performance Estimation and Explanation When Labels and A Few Features Shifts
Language:Python7 9 01
stanford-futuredata/sketchstore
Algorithms for compressing and merging large collections of sketches
Language:Jupyter Notebook5 9 02
stanford-futuredata/smol
Language:C++5 8 13
stanford-futuredata/supg
Language:Python5 8 24
stanford-futuredata/parallel-lb-simulator
Language:Java4 7 02
stanford-futuredata/abae
Accelerating Approximate Aggregation Queries with Expensive Predicates (VLDB 21)
Language:Python2 8 11
stanford-futuredata/ezmode
An iterative algorithm for selecting rare events in large, unlabeled datasets
Language:Python1 7 0
stanford-futuredata/pop-ncflow
Code for POP (SOSP 2021) and NCFlow (NSDI 2021)
Language:Jupyter Notebook1 2 04
stanford-futuredata/redisgeo-bench
Simple benchmark for Redis geosets for top-k queries.
Language:Rust9 0
stanford-futuredata/teavar
Language:Julia1 0