yining610's Stars
karpathy/llm.c
LLM training in simple, raw C/CUDA
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
pytorch/torchtune
PyTorch native finetuning library
seatgeek/thefuzz
Fuzzy String Matching in Python
Zjh-819/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
pytorch/torchtitan
A native PyTorch Library for large model training
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
noahshinn/reflexion
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
openai/human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
microsoft/CodeXGLUE
CodeXGLUE
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
karpathy/arxiv-sanity-lite
arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.
ezelikman/quiet-star
Code for Quiet-STaR
acl-org/aclpubcheck
Tools for checking ACL paper submissions
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
jwkirchenbauer/lm-watermarking
shmsw25/FActScore
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
shunzh/Code-AI-Tree-Search
MichSchli/AVeriTeC
ryokamoi/wice
This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.
JHU-CLSP/RATIONALYST
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
liujch1998/rainier
JHU-CLSP/NeoCoder
Official implementation of our paper "Benchmarking Language Model Creativity: A Case Study on Code Generation"
zipJiang/RORA
RORA: Robust Free-Text Rationale Evaluation
zipJiang/Core
Core: Robust Factual Precision Scoring with Informative Sub-Claim Identification
arash-shahmansoori/decompose_net
Decompose and Conquer: Introducing the First Open Source Large Language Model with Multi-Modal Task Decomposition Capabilities
yining610/NLP-Reading-List
Yining's reading list
JHU-CLSP/slack_lm
Connect our internal LLM to Slack
zipJiang/adversarial-factuality
Factuality Evaluation that is Robust to Trickeries