Pinned Repositories
auto_dataset_card
Wouldn't it be nice to generate parts of our dataset card automagically?
awesome-synthetic-datasets
awesome synthetic (text) datasets
blog
data-for-fine-tuning-llms
fastai4GLAMS
A study group for v4 of the fastai introduction to deep learning course with a focus on applications in GLAM settings
flyswot
Command Line Interface for running 🤗 Transformers Image Classification locally
haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
huggingface-tldr
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
LLM-pubmed-query-generation-evaluation
LLM PubMed Query Generation Evaluation
data-is-better-together
Let's build better datasets, together!
davanstrien's Repositories
davanstrien/awesome-synthetic-datasets
awesome synthetic (text) datasets
davanstrien/data-for-fine-tuning-llms
davanstrien/haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
davanstrien/flyswot
Command Line Interface for running 🤗 Transformers Image Classification locally
davanstrien/blog
davanstrien/huggingface-tldr
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
davanstrien/LLM-pubmed-query-generation-evaluation
LLM PubMed Query Generation Evaluation
davanstrien/Python-introduction-for-digital-collections
Workshop materials on Python as part of a series of Library Carpentry workshops at the British Library
davanstrien/cosmo-chat
davanstrien/distilabel
⚗️ AI Feedback framework for scalable LLM alignment
davanstrien/machine-learning-librarian
davanstrien/magpie-ollama-datagen
davanstrien/sync-hf-test
davanstrien/trl
Train transformer language models with reinforcement learning.
davanstrien/api-inference-community
davanstrien/argilla
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
davanstrien/arxiv.py
Python wrapper for the arXiv API
davanstrien/BERTopic
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
davanstrien/camera
davanstrien/colpali
The code used to train and run inference with the ColPali architecture.
davanstrien/Computer-Vision-for-the-Humanities-workshop
Computer Vision for the Humanities workshop
davanstrien/Fin-Fact
A Benchmark Dataset for Multimodal Scientific Fact Checking
davanstrien/gahd
GAHD: A German Adversarial Hate speech Dataset
davanstrien/ml-kge
Multilingual Knowledge Graph Enhancement (EMNLP 2023)
davanstrien/monitor-prompts-hf
A Gradio app to monitor annotation effort done by users using the Argilla HF Space over the prompt dataset
davanstrien/outlines
Structured Text Generation
davanstrien/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
davanstrien/polars-plugins-tutorial
How you (yes, you!) can write a Polars Plugin
davanstrien/polars-url
A polars plugin to parse/extract fields from urls
davanstrien/web-languages
Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code