hynky1999's Stars
romkatv/powerlevel10k
A Zsh theme
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Pythagora-io/gpt-pilot
The first real AI developer
EthicalML/awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
rougier/numpy-100
100 numpy exercises (with solutions)
cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
argilla-io/argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
github/scripts-to-rule-them-all
Set of boilerplate scripts describing the normalized script pattern that GitHub uses in its projects.
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
mlfoundations/dclm
DataComp for Language Models
google-research/deduplicate-text-datasets
srush/Triton-Puzzles
Puzzles for learning Triton
stanford-oval/WikiChat
WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.
huggingface/lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
firebase/genkit
An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to integrate, test, and deploy sophisticated AI features to Firebase or Google Cloud.
mlabonne/llm-autoeval
Automatically evaluate your LLMs in Google Colab
ml6team/fondant
Production-ready data processing made easy and shareable
kimtth/awesome-azure-openai-llm
"Awesome-LLM: a curated list of Azure OpenAI & Large Language Models" 🔎References to Azure OpenAI, 🦙Large Language Models, and related 🌌 services and 🎋libraries.
jondurbin/bagel
A bagel, with everything.
allenai/wimbd
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
alon-albalak/data-selection-survey
A Survey on Data Selection for Language Models
bitextor/bicleaner
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
kokes/od
Česká otevřená data
Azure-Samples/openai-end-to-end-baseline
bitextor/warc2text
Extracts plain text, language identification and more metadata from WARC records
versotym/corpusCzechVerse
This repo contains 1305 books of poetry from the Corpus of Czech Verse. Annotated poetic meters, rhymes, tokenized, lemmatized, POS-tagged.
bitextor/monocleaner
hplt-project/warc2text-runner
Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.