mihara-bot's Stars
facebookresearch/faiss
A library for efficient similarity search and clustering of dense vectors.
open-mmlab/mmdetection
OpenMMLab Detection Toolbox and Benchmark
meta-llama/llama3
The official Meta Llama 3 GitHub site
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
attardi/wikiextractor
A tool for extracting plain text from Wikipedia dumps
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
allenai/dolma
Data and tools for generating and inspecting OLMo pre-training data.
conversationai/perspectiveapi
Perspective is an API that uses machine learning models to score the perceived impact a comment might have on a conversation. See https://developers.perspectiveapi.com for more information.
CarperAI/OpenELM
Evolution Through Large Models
Shark-NLP/OpenICL
OpenICL is an open-source framework to facilitate research, development, and prototyping of in-context learning.
hkust-nlp/deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
subhadarship/kmeans_pytorch
kmeans using PyTorch
princeton-nlp/LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
NUS-HPC-AI-Lab/InfoBatch
Lossless Training Speed Up by Unbiased Dynamic Data Pruning
sangmichaelxie/doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
eth-sri/language-model-arithmetic
Controlled Text Generation via Language Model Arithmetic
princeton-nlp/QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
SALT-NLP/demonstrated-feedback
ahalterman/mordecai3
Full text geoparsing/toponym resolution with event geolocation
locuslab/scaling_laws_data_filtering
skzhang1/IDEAL
IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models
adymaharana/d2pruning
cohere-ai/human-feedback-paper
Code and data from the paper 'Human Feedback is not Gold Standard'
lucy3/whos_filtered
daeveraert/gradient-information-optimization
Implementation of Gradient Information Optimization (GIO) for effective and scalable training data selection
xlhex/acl2024_xicl