matiasED

Córdoba, Argentina

matiasED's Stars

relari-ai/continuous-eval
Data-Driven Evaluation for LLM-Powered Applications
Language:Python45731
foundation-model-stack/fms-extras
Language:Python229
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
Language:Python20632
abetlen/llama-cpp-python
Python bindings for llama.cpp
Language:Python8.3k1k
AI-Commandos/LLaMa2lang
Convenience scripts to finetune (chat-)LLaMa3 and other models for any language
Language:Python29132
instructor-ai/instructor
structured outputs for llms
Language:Python8.7k687
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Language:C++35138
intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Language:Python6.8k1.3k
superagent-ai/superagent-py
Superagent Python SDK
Language:Python12439
agiresearch/Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
Language:Python11110
Vahe1994/AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
Language:Python1.2k181
IST-DASLab/peft-rosa
A fork of the PEFT library, supporting Robust Adaptation (RoSA)
Language:Python133
AnswerDotAI/RAGatouille
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Language:Python3.1k212
yuchenlin/LLM-Blender
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source LLMs. LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.
Language:Python89980
Mihaiii/llm_steer
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectors
Language:Python21012
apoorvumang/prompt-lookup-decoding
Language:Jupyter Notebook48623
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python137k27.4k
konstmish/prodigy
The Prodigy optimizer and its variants for training neural networks.
Language:Python35323
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Language:Jupyter Notebook2.4k164
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Language:Python36.9k4.5k
microsoft/LLMLingua
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Language:Python4.8k266
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python66252
neuralmagic/sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Language:Python2.1k149
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
Language:Jupyter Notebook15.8k2.3k
fw-ai/llama-cuda-graph-example
Example of applying CUDA graphs to LLaMA-v2
104
facebookresearch/Pearl
A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
Language:Jupyter Notebook2.7k170
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.3k149
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.8k634
tensorchord/envd
🏕️ Reproducible development environment
Language:Go2.1k159
pjlab-sys4nlp/llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Language:Python89748

matiasED

matiasED's Stars

relari-ai/continuous-eval

foundation-model-stack/fms-extras

foundation-model-stack/fms-fsdp

abetlen/llama-cpp-python

AI-Commandos/LLaMa2lang

instructor-ai/instructor

intel/neural-speed

intel-analytics/ipex-llm

superagent-ai/superagent-py

agiresearch/Formal-LLM

Vahe1994/AQLM

IST-DASLab/peft-rosa

AnswerDotAI/RAGatouille

yuchenlin/LLM-Blender

Mihaiii/llm_steer

apoorvumang/prompt-lookup-decoding

huggingface/transformers

konstmish/prodigy

FasterDecoding/Medusa

hiyouga/LLaMA-Factory

microsoft/LLMLingua

IST-DASLab/marlin

neuralmagic/sparseml

meta-llama/llama-recipes

fw-ai/llama-cuda-graph-example

facebookresearch/Pearl

predibase/lorax

facebookresearch/xformers

tensorchord/envd

pjlab-sys4nlp/llama-moe