metterian's Stars
prometheus-eval/prometheus
[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
facebookresearch/cc_net
Tools to download and cleanup Common Crawl data
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
jwkangmarco/LLM-narratives
AfricanLlama/ALMA
This repo intends to extend ALMA to have African Language Support.
Llama2D/llama-recipes
Examples and recipes for Llama 2 model
konstantinjdobler/focus
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
state-spaces/mamba
Mamba SSM architecture
databricks/megablocks
john-hewitt/backpacks-flash-attn
The original Backpack Language Model implementation, a fork of FlashAttention
clap-lab/cogtok
Analyzing Cognitive Plausibility of Subword Tokenization
aws-samples/aws-ml-jp
SageMakerで機械学習モデルを構築、学習、デプロイする方法が学べるNotebookと教材集
utilForever/awesome-cafe
☕ 모각코하기 좋은 국내 카페 리스트
EleutherAI/dps
Data processing system for polyglot
kakaobrain/kortok
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
lovit/huggingface_konlpy
Training Transformers of Huggingface with KoNLPy
princeton-nlp/MultilingualAnalysis
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"
princeton-nlp/align-mlm
MrBananaHuman/Ko-En_dictionary
MrBananaHuman/PangyoCorpora
google-research/deduplicate-text-datasets
cwnu-airlab/NLTKo
ray-project/llm-numbers
Numbers every LLM developer should know
WeareSoft/tech-interview
:loudspeaker:🙍 tech interview
trailerAI/KoTAN
KoTAN: Korean Translation and Augmentation with fine-tuned NLLB
dremdeveloper/codingtest_python
코딩테스트 합격자 되기(파이썬)
declare-lab/flan-alpaca
This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as Flan-T5.
boost-devs/ai-tech-interview
👩💻👨💻 AI 엔지니어 기술 면접 스터디 (⭐️ 1k+)