Pinned Repositories
ALCE
[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
DensePhrases
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
LM-BFF
[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
PURE
[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
tree-of-thought-llm
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Princeton Natural Language Processing's Repositories
princeton-nlp/tree-of-thought-llm
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
princeton-nlp/SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
princeton-nlp/ALCE
[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
princeton-nlp/LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
princeton-nlp/WebShop
[NeurIPS 2022] đź›’WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
princeton-nlp/AutoCompressors
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
princeton-nlp/ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
princeton-nlp/QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
princeton-nlp/HELMET
The HELMET Benchmark
princeton-nlp/CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
princeton-nlp/LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
princeton-nlp/MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
princeton-nlp/USACO
Can Language Models Solve Olympiad Programming?
princeton-nlp/CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
princeton-nlp/NLProofS
EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443
princeton-nlp/LitSearch
[EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search
princeton-nlp/c-sts
[EMNLP 2023] C-STS: Conditional Semantic Textual Similarity
princeton-nlp/ShortcutGrammar
EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560
princeton-nlp/Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
princeton-nlp/LM-Science-Tutor
princeton-nlp/benign-data-breaks-safety
princeton-nlp/PTP
Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
princeton-nlp/unintentional-unalignment
[ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
princeton-nlp/ELIZA-Transformer
[NAACL 2025] Representing Rule-based Chatbots with Transformers
princeton-nlp/CopyCat
princeton-nlp/Heuristic-Core
[ACL 2024] The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models - https://arxiv.org/abs/2403.03942
princeton-nlp/il-scaling-in-games
Official code repo of "Scaling Laws for Imitation Learning in Single-Agent Games"
princeton-nlp/continual-factoid-memorization
Continual Memorization of Factoids in Large Language Models
princeton-nlp/impersona