metterian

NLP Engineer 42dot | Korea University

42dotSeoul

metterian's Stars

prometheus-eval/prometheus
[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.
Language:Python29317
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
Language:Python3.8k284
facebookresearch/cc_net
Tools to download and cleanup Common Crawl data
Language:Python977143
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Language:Python4.6k350
jwkangmarco/LLM-narratives
23
AfricanLlama/ALMA
This repo intends to extend ALMA to have African Language Support.
Language:Python2
Llama2D/llama-recipes
Examples and recipes for Llama 2 model
Language:Python1
konstantinjdobler/focus
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
Language:Python313
state-spaces/mamba
Mamba SSM architecture
Language:Python13.6k1.2k
databricks/megablocks
Language:Python1.2k176
john-hewitt/backpacks-flash-attn
The original Backpack Language Model implementation, a fork of FlashAttention
Language:Python666
clap-lab/cogtok
Analyzing Cognitive Plausibility of Subword Tokenization
Language:Python71
aws-samples/aws-ml-jp
SageMakerで機械学習モデルを構築、学習、デプロイする方法が学べるNotebookと教材集
Language:Jupyter Notebook15741
utilForever/awesome-cafe
☕ 모각코하기 좋은 국내 카페 리스트
1.3k83
EleutherAI/dps
Data processing system for polyglot
Language:Python9126
kakaobrain/kortok
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
Language:Python11610
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
Language:Python10.9k2.4k
lovit/huggingface_konlpy
Training Transformers of Huggingface with KoNLPy
Language:Jupyter Notebook689
princeton-nlp/MultilingualAnalysis
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"
Language:Python13
princeton-nlp/align-mlm
Language:Python13
MrBananaHuman/Ko-En_dictionary
3
MrBananaHuman/PangyoCorpora
332
google-research/deduplicate-text-datasets
Language:Rust1.1k112
cwnu-airlab/NLTKo
Language:Python131
ray-project/llm-numbers
Numbers every LLM developer should know
4.1k141
WeareSoft/tech-interview
:loudspeaker:🙍 tech interview
4.5k753
trailerAI/KoTAN
KoTAN: Korean Translation and Augmentation with fine-tuned NLLB
Language:Python241
dremdeveloper/codingtest_python
코딩테스트 합격자 되기(파이썬)
Language:Python12741
declare-lab/flan-alpaca
This repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as Flan-T5.
Language:Python34937
boost-devs/ai-tech-interview
👩‍💻👨‍💻 AI 엔지니어 기술 면접 스터디 (⭐️ 1k+)
1.9k459

metterian

metterian's Stars

prometheus-eval/prometheus

microsoft/LMOps

facebookresearch/cc_net

togethercomputer/RedPajama-Data

jwkangmarco/LLM-narratives

AfricanLlama/ALMA

Llama2D/llama-recipes

konstantinjdobler/focus

state-spaces/mamba

databricks/megablocks

john-hewitt/backpacks-flash-attn

clap-lab/cogtok

aws-samples/aws-ml-jp

utilForever/awesome-cafe

EleutherAI/dps

kakaobrain/kortok

NVIDIA/Megatron-LM

lovit/huggingface_konlpy

princeton-nlp/MultilingualAnalysis

princeton-nlp/align-mlm

MrBananaHuman/Ko-En_dictionary

MrBananaHuman/PangyoCorpora

google-research/deduplicate-text-datasets

cwnu-airlab/NLTKo

ray-project/llm-numbers

WeareSoft/tech-interview

trailerAI/KoTAN

dremdeveloper/codingtest_python

declare-lab/flan-alpaca

boost-devs/ai-tech-interview