Yangjinluan's Stars
hkust-nlp/simpleRL-reason
Simple RL training for reasoning
NovaSky-AI/SkyThought
Sky-T1: Train your own O1 preview model within $450
RUCAIBox/Slow_Thinking_with_LLMs
A series of technical report on Slow Thinking with LLM
Zhen-Tan-dmml/LLM4Annotation
bruno686/Awesome-RL-based-LLM-Reasoning
Awesome RL-based LLM Reasoning
pengr/LLM-Synthetic-Data
Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥
RLHFlow/Self-rewarding-reasoning-LLM
Recipes to train the self-rewarding reasoning LLMs.
horseee/CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
NineAbyss/S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
thu-ml/STAIR
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
pgasawa/BARE
Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation
tanganke/peta
Code for paper "Parameter Efficient Multi-task Model Fusion with Partial Linearization"
uservan/ThinkPO
AI45Lab/DEAN
tanganke/opcm
tanganke/pareto_set_learning
Code for paper "Towards Efficient Pareto Set Approximation via Weight-Ensembling Mixture of Experts"
PKU-Alignment/llms-resist-alignment
Repo for paper "Language Models Resist Alignment"
tanganke/point_cloud_viewer
Simple OpenGL program to visualize point cloud.
zhaoy777/AFICE
Aligning Large Language Models for Faithful Integrity Against Opposing Argument
tanganke/introduction-to-factorio
introductory book to factorio.
tanganke/MathematicaCppProgramming
examples and tutorials of calling C/C++ in Wolfram Language ( Mathematica )
tanganke/pytorch_classification
tanganke/pyutils2
personal python toolkit https://anke-pyutils.readthedocs.io/en/latest/
tanganke/Awesome-Model-Merging
:couple: A curated list of Model Merging methods.
tanganke/Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
tanganke/oh-my-bash
A delightful community-driven framework for managing your bash configuration, and an auto-update tool so that makes it easy to keep up with the latest updates from the community.
tanganke/pytorch_optimizer
optimizer & lr scheduler & loss function collections in PyTorch
tanganke/wildcat-beamer-template
Modified latex template
Yangjinluan/DAM
Codes for "Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace" (ICLR2025)
Yangjinluan/HEI
Codes for "Leveraging Invariant Principle for Heterophilic Graph Structure Distribution Shifts "(WWW2025)