thedatajanitor's Stars
google-deepmind/uncertain_ground_truth
Dermatology ddx dataset, Jax implementations of Monte Carlo conformal prediction, plausibility regions and statistical annotation aggregation from our recent work on uncertain ground truth (TMLR'23 and ArXiv pre-print).
facebookresearch/exca
Exca - Execution and caching tool for python
tursodatabase/limbo
Limbo is a work-in-progress, in-process OLTP database management system, compatible with SQLite.
pydantic/pydantic-ai
Agent Framework / shim to use Pydantic with LLMs
DS4SD/docling
Get your documents ready for gen AI
ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
agora-protocol/paper-demo
anthropics/courses
Anthropic's educational courses
i-am-bee/bee-agent-framework
Framework for building scalable agentic applications.
spiraldb/vortex
An extensible, state-of-the-art columnar file format
openai/swarm
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
NirDiamant/Prompt_Engineering
This repository offers a comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies. It serves as an essential resource for mastering the art of effectively communicating with and leveraging large language models in AI applications.
argilla-io/argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Mirascope/mirascope
LLM abstractions that aren't obstructions
keras47/claude_sonnet_3.5
Python superprompt Claude Sonnet 3.5
AdieLaine/multi-agent-reasoning
The Multi-Agent Reasoning framework creates an interactive chatbot where AI agents collaborate via structured reasoning and Swarm Integration for optimal answers. Simulating a team that discusses, debates, and refines responses, it enables complex problem-solving and precise results. Now with Prompt Caching to reduce latency and costs.
xjdr-alt/entropix
Entropy Based Sampling and Parallel CoT Decoding
ShengranHu/ADAS
Automated Design of Agentic Systems
GAIR-NLP/ProX
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
MinorJerry/WebVoyager
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
ozekik/awesome-ontology
A curated list of ontology things
microsoft/sammo
A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)
MadcowD/ell
A language model programming library.
PatrickJS/awesome-cursorrules
📄 A curated list of awesome .cursorrules files
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
PrefectHQ/ControlFlow
🦾 Take control of your AI agents
MilesCranmer/PySR
High-Performance Symbolic Regression in Python and Julia
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
truefoundry/cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry