hynky1999

ML Engineer @huggingface

hynky1999's Stars

romkatv/powerlevel10k
A Zsh theme
Language:Shell45.8k 183 2.5k2.2k
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python33.5k 204 1.2k3.8k
Pythagora-io/gpt-pilot
The first real AI developer
Language:Python30.2k 267 5173k
EthicalML/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
17.4k 403 762.2k
rougier/numpy-100
100 numpy exercises (with solutions)
Language:Python12k 205 855.7k
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Language:Python9.4k 89 365731
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python6.6k 37 1.1k1.7k
argilla-io/argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
Language:Python3.8k 29 2.1k359
github/scripts-to-rule-them-all
Set of boilerplate scripts describing the normalized script pattern that GitHub uses in its projects.
Language:Shell3.2k 384 0249
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
Language:Python3.1k 37 150241
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Language:Python2k 44 125138
mlfoundations/dclm
DataComp for Language Models
Language:HTML1.1k 38 54100
google-research/deduplicate-text-datasets
Language:Rust1.1k 13 41108
srush/Triton-Puzzles
Puzzles for learning Triton
Language:Jupyter Notebook1k 10 1065
stanford-oval/WikiChat
WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.
Language:Python999 16 2295
huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Language:Python690 29 12879
firebase/genkit
An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to integrate, test, and deploy sophisticated AI features to Firebase or Google Cloud.
Language:TypeScript665 25 26688
mlabonne/llm-autoeval
Automatically evaluate your LLMs in Google Colab
Language:Python527 7 2082
ml6team/fondant
Production-ready data processing made easy and shareable
Language:Python339 6 31726
kimtth/awesome-azure-openai-llm
"Awesome-LLM: a curated list of Azure OpenAI & Large Language Models" 🔎References to Azure OpenAI, 🦙Large Language Models, and related 🌌 services and 🎋libraries.
Language:Jupyter Notebook316 10 145
jondurbin/bagel
A bagel, with everything.
Language:Python307 11 1231
allenai/wimbd
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Language:Python181 6 1019
alon-albalak/data-selection-survey
A Survey on Data Selection for Language Models
149 4 17
bitextor/bicleaner
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Language:Python148 14 5222
kokes/od
Česká otevřená data
Language:Python132 15 20116
Azure-Samples/openai-end-to-end-baseline
Language:Bicep101 16 645
bitextor/warc2text
Extracts plain text, language identification and more metadata from WARC records
Language:C++20 9 255
versotym/corpusCzechVerse
This repo contains 1305 books of poetry from the Corpus of Czech Verse. Annotated poetic meters, rhymes, tokenized, lemmatized, POS-tagged.
10 1 04
bitextor/monocleaner
Language:Python6 9 31
hplt-project/warc2text-runner
Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.
Language:HTML3 4 90

hynky1999

hynky1999's Stars

romkatv/powerlevel10k

RVC-Boss/GPT-SoVITS

Pythagora-io/gpt-pilot

EthicalML/awesome-production-machine-learning

rougier/numpy-100

cleanlab/cleanlab

EleutherAI/lm-evaluation-harness

argilla-io/argilla

github/scripts-to-rule-them-all

MooreThreads/Moore-AnimateAnyone

huggingface/datatrove

mlfoundations/dclm

google-research/deduplicate-text-datasets

srush/Triton-Puzzles

stanford-oval/WikiChat

huggingface/lighteval

firebase/genkit

mlabonne/llm-autoeval

ml6team/fondant

kimtth/awesome-azure-openai-llm

jondurbin/bagel

allenai/wimbd

alon-albalak/data-selection-survey

bitextor/bicleaner

kokes/od

Azure-Samples/openai-end-to-end-baseline

bitextor/warc2text

versotym/corpusCzechVerse

bitextor/monocleaner

hplt-project/warc2text-runner