tongyx361
Senior undergraduate @ DCST, Tsinghua University. Research intern @hkust-nlp (previously: @THUDM). Interested in LLM & AI for Education/Research/Software Eng.
Tsinghua UniversityBeijing, China
tongyx361's Stars
twitter/the-algorithm
Source code for Twitter's Recommendation Algorithm
xai-org/grok-1
Grok open release
dair-ai/ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
mozillazg/python-pinyin
汉字转拼音(pypinyin)
xlang-ai/OpenAgents
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
openai/transformer-debugger
pyutils/line_profiler
Line-by-line profiling for Python
openai/simple-evals
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
SakanaAI/evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
openai/following-instructions-human-feedback
lxneng/xpinyin
Translate Chinese hanzi to pinyin (拼音) by Python, 汉字转拼音
ruixiangcui/AGIEval
sylinrl/TruthfulQA
TruthfulQA: Measuring How Models Imitate Human Falsehoods
noxdafox/pebble
Multi threading and processing eye-candy.
FloridSleeves/LLMDebugger
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step
meta-math/MetaMath
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
huggingface/datablations
Scaling Data-Constrained Language Models
OpenBMB/Eurus
web-arena-x/visualwebarena
VisualWebArena is a benchmark for multimodal agents.
lm-sys/arena-hard
Arena-Hard benchmark
princeton-nlp/LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
OpenBMB/OlympiadBench
[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems.
tongyx361/Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise descriptions to help readers get the gist as quickly as possible.
qtli/GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
tongyx361/Awesome-LLM-Research
Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise descriptions to help readers get the gist as quickly as possible.
liyucheng09/LatestEval
Latest Evaluation Toolkit (LatestEval). Assessing the language models with latest, uncontaminated materials.
midas-research/mathify
An extensive mathematics dataset called MathQuest sourced from the 11th and 12th standard Mathematics NCERT textbooks.
nii-yamagishilab/mla
A Multi-Level Attention Model for Evidence-Based Fact Checking
zhaochenyang20/data_mining
数据挖掘