schen149's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
qdrant/qdrant
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
google-deepmind/pysc2
StarCraft II Learning Environment
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
allenai/OLMo
Modeling, training, eval, and inference code for OLMo
srush/Tensor-Puzzles
Solve puzzles. Improve your pytorch.
google/BIG-bench
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
castorini/pyserini
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
allenai/natural-instructions
Expanding natural instructions
jxmorris12/vec2text
utilities for decoding deep representations (like sentence embeddings) back to text
EdinburghNLP/awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
mlfoundations/task_vectors
Editing Models with Task Arithmetic
luyug/GradCache
Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint
swj0419/detect-pretrain-code
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu , Terra Blevins , Danqi Chen , Luke Zettlemoyer.
GFNOrg/gfn-lm-tuning
chentong0/factoid-wiki
Dense X Retrieval: What Retrieval Granularity Should We Use?
chaitanyamalaviya/ExpertQA
[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
ludwigwinkler/JaxLightning
Running Jax in PyTorch Lightning
schen149/sub-sentence-encoder
The official code repo for "Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations".
danieldeutsch/repro
Repro is a library for easily running code from published papers via Docker.
ryokamoi/wice
This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.
shadowkiller33/Contrast-Instruction
google-research-datasets/PropSegmEnt
PropSegmEnt is an annotated dataset for segmenting English text into propositions, and recognizing proposition-level entailment relations - whether a different, related document entails each proposition, contradicts it, or neither. It consists of clusters of closely related documents from the news and Wikipedia domains.
CogComp/MultiOpEd
MULTIOPED: A Corpus of Multi-Perspective News Editorials.
naimenz/inverse-scaling-eval-pipeline
Basic pipeline for running different sized GPT models and plotting the results
TRUMANCFY/MixGR
JHU-CLSP/Cost-Effective-Experiment
Scripts and docs that help us run cost effective experiment with OpenAI APIs
CogComp/transformer-lm-demo
A simple demo of transformer language models, mostly for our internal use: http://dickens.seas.upenn.edu:4001
schen149/PropSegmEnt
PropSegmEnt is an annotated dataset for segmenting English text into propositions, and recognizing proposition-level entailment relations - whether a different, related document entails each proposition, contradicts it, or neither. It consists of clusters of closely related documents from the news and Wikipedia domains.
stevenysw/causal_gfl