markjin1990
Software Engineer at TikTok/ByteDance. Ph.D. from University of Michigan advised by Prof. Mike Cafarella and H. V. Jagadish
TikTok Inc.San Jose, CA
markjin1990's Stars
GreptimeTeam/greptimedb
An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
DefTruth/Awesome-LLM-Inference
๐A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
lancedb/lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
mitdbg/palimpzest
A Declarative System for Optimizing AI Workloads
facebookincubator/nimble
New file format for storage of large columnar datasets.
apache/datafusion-comet
Apache DataFusion Comet Spark Accelerator
TimelyDataflow/timely-dataflow
A modular implementation of timely dataflow in Rust
jorgecarleitao/arrow2
Transmute-free Rust library to work with the Arrow format
apache/datafusion
Apache DataFusion SQL Query Engine
ByConity/ByConity
ByConity is an open source cloud data warehouse
duckdb/duckdb
DuckDB is an analytical in-process SQL database management system
sfu-dis/corobase
Coroutine-Oriented Main-Memory Database Engine (VLDB 2021)
StarRocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
hydro-project/hydroflow
Hydro's low-level dataflow runtime
heavyai/heavydb
HeavyDB (formerly OmniSciDB)
varadaio/presto-workload-analyzer
The Workload Analyzer collects Prestoยฎ and Trino workload statistics, and analyzes them
intel/BDTK
A modular acceleration toolkit for big data analytic engines
datafuselabs/databend
๐๐ฎ๐๐ฎ, ๐๐ป๐ฎ๐น๐๐๐ถ๐ฐ๐ & ๐๐. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
yoshinorim/quickstack
A tool to take call stack traces with minimal overheads
sfu-db/connector-x
Fastest library to load data from DB to DataFrames in Rust and Python
tum-db/user-defined-operators
Implementation and artifacts for "User-Defined Operators: Efficiently Integrating Custom Algorithms into Modern Databases"
apache/hawq
Apache HAWQ
pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
apache/incubator-gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
vesoft-inc/nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
alshedivat/al-folio
A beautiful, simple, clean, and responsive Jekyll theme for academics
brianfrankcooper/YCSB
Yahoo! Cloud Serving Benchmark
brendangregg/FlameGraph
Stack trace visualizer
substrait-io/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
apache/gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.