Pinned Repositories
auto_dataset_card
Wouldn't it be nice to generate parts of our dataset card automagically?
awesome-synthetic-datasets
awesome synthetic (text) datasets
blog
data-for-fine-tuning-llms
fastai4GLAMS
A study group for v4 of the fastai introduction to deep learning course with a focus on applications in GLAM settings
flyswot
Command Line Interface for running 🤗 Transformers Image Classification locally
haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
huggingface-tldr
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
Python-introduction-for-digital-collections
Workshop materials on Python as part of a series of Library Carpentry workshops at the British Library
data-is-better-together
Let's build better datasets, together!
davanstrien's Repositories
davanstrien/awesome-synthetic-datasets
awesome synthetic (text) datasets
davanstrien/haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
davanstrien/flyswot
Command Line Interface for running 🤗 Transformers Image Classification locally
davanstrien/huggingface-tldr
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
davanstrien/blog
davanstrien/auto_dataset_card
Wouldn't it be nice to generate parts of our dataset card automagically?
davanstrien/Python-introduction-for-digital-collections
Workshop materials on Python as part of a series of Library Carpentry workshops at the British Library
davanstrien/hugit-cli
push ImageFolder style image datasets to the 🤗 Hub from the command line
davanstrien/LLM-pubmed-query-generation-evaluation
LLM PubMed Query Generation Evaluation
davanstrien/sync-hf-test
davanstrien/api-inference-community
davanstrien/arch-hf-embedder
davanstrien/argilla
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
davanstrien/arxiv.py
Python wrapper for the arXiv API
davanstrien/awesome-list
Awesome AI in Libraries
davanstrien/BERTopic
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
davanstrien/camera
davanstrien/Computer-Vision-for-the-Humanities-workshop
Computer Vision for the Humanities workshop
davanstrien/cosmo-chat
davanstrien/data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
davanstrien/distilabel
⚗️ AI Feedback framework for scalable LLM alignment
davanstrien/Fin-Fact
A Benchmark Dataset for Multimodal Scientific Fact Checking
davanstrien/gahd
GAHD: A German Adversarial Hate speech Dataset
davanstrien/huggingface_hub
All the open source things related to the Hugging Face Hub.
davanstrien/iiif2annos
OCR a IIIF images in a manifest and generate annotations
davanstrien/jekyll
static site version of Programming Historian
davanstrien/kraken
OCR engine for all the languages
davanstrien/ml-kge
Multilingual Knowledge Graph Enhancement (EMNLP 2023)
davanstrien/monitor-prompts-hf
A Gradio app to monitor annotation effort done by users using the Argilla HF Space over the prompt dataset
davanstrien/Website-Classification
Trying to classify web archives using metadata...