Pinned Repositories
ProX
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
amber-train
Pre-training code for Amber 7B LLM
Awesome-DataCentric-LLM
Trending projects & awesome papers about data-centric llm studies.
awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
CS385Projects
Independent Projects for SJTU CS385
tacube
[EMNLP 2022] TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data
Lemur
[ICLR 2024] Lemur: Open Foundation Models for Language Agents
symbolic-instruction-tuning
The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".
OpenAgents
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
koalazf99's Repositories
koalazf99/Awesome-DataCentric-LLM
Trending projects & awesome papers about data-centric llm studies.
koalazf99/tacube
[EMNLP 2022] TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data
koalazf99/CS385Projects
Independent Projects for SJTU CS385
koalazf99/amber-train
Pre-training code for Amber 7B LLM
koalazf99/awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
koalazf99/CodeQwen1.5
CodeQwen1.5 is the code version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.
koalazf99/cs2916
koalazf99/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
koalazf99/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
koalazf99/dbt-test
koalazf99/koalazf99.github.io
Personal Page
koalazf99/LLM-Agent-Survey
koalazf99/openai-cookbook
Examples and guides for using the OpenAI API
koalazf99/code-llm-contamination
koalazf99/dspy
DSPy: The framework for programming—not prompting—foundation models
koalazf99/ebooks
收藏的一些经典的历史、政治、心理、哲学、数学、计算机方面电子书(约10万本)
koalazf99/k2-train
koalazf99/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
koalazf99/llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
koalazf99/magpie
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
koalazf99/mink
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu , Terra Blevins , Danqi Chen , Luke Zettlemoyer.
koalazf99/open-interpreter
OpenAI's Code Interpreter in your terminal, running locally
koalazf99/prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
koalazf99/sailcraft
Data Toolkit for Sailor Language Models
koalazf99/temp-open-instruct
temp-fork
koalazf99/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.