Pinned Repositories
auto_dataset_card
Wouldn't it be nice to generate parts of our dataset card automagically?
awesome-synthetic-datasets
awesome synthetic (text) datasets
blog
data-for-fine-tuning-llms
fastai4GLAMS
A study group for v4 of the fastai introduction to deep learning course with a focus on applications in GLAM settings
flyswot
Command Line Interface for running 🤗 Transformers Image Classification locally
haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
huggingface-tldr
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
LLM-pubmed-query-generation-evaluation
LLM PubMed Query Generation Evaluation
data-is-better-together
Let's build better datasets, together!
davanstrien's Repositories
davanstrien/auto_dataset_card
Wouldn't it be nice to generate parts of our dataset card automagically?
davanstrien/ImageIN
Find illustrations in historic book using computer vision
davanstrien/hugit-cli
push ImageFolder style image datasets to the 🤗 Hub from the command line
davanstrien/machine-learning-librarians-archivists
Introduction to Machine Learning for Librarians & Archivists
davanstrien/altoxml2dataset
Prepare ALTO XML datasets to Hugging Face datasets compatible format
davanstrien/Computer-Vision-for-the-Humanities-an-introduction-to-deep-learning-for-image-classification
This repository contains alternative setup instructions for a forthcoming Programming Historian lesson 'Computer Vision for the Humanities: an introduction to deep learning for image classification'.
davanstrien/hmd_newspaper_dl-1
Bulk download British Library Heritage Made Digital Newspapers 📰
davanstrien/Hub
🔌 A central repository collecting pre-trained adapter modules
davanstrien/piffle
python library for generating and parsing IIIF Image API URLs
davanstrien/sphinx-book-theme
A clean book theme for scientific explanations and documentation with Sphinx
davanstrien/annotations
Annotations stored in a git repo
davanstrien/arch-hf-embedder
davanstrien/blog-1
Public repo for HF blog posts
davanstrien/data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
davanstrien/digitised-books-ocr-and-metadata
British Library Digitised Books c. 1510 - c. 1900: JSONL (OCR derived text & metadata)
davanstrien/flyswot-gym
🦾 training flyswot models (fairly) easily
davanstrien/github_nb_report
Living with Machines GitHub Stats report
davanstrien/historical_texts
BigScience working group on language models for historical texts
davanstrien/huggingface_hub
All the open source things related to the Hugging Face Hub.
davanstrien/jekyll
static site version of Programming Historian
davanstrien/kedro
A Python framework for creating reproducible, maintainable and modular data science code.
davanstrien/lam
Libraries, Archives and Museums (LAM)
davanstrien/logseq
This is simply to publish my templates to the world :)
davanstrien/meta
A community dedicated to supporting tools for technical and scientific communication and interactive computing
davanstrien/metadata-working-group
A collection of Jupyter notebooks used for interactive learning sessions in the AI4LAM Metadata working group meetings. Jupyter Book available at https://ai4lam.github.io/metadata-working-group/
davanstrien/optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
davanstrien/quarto-actions
davanstrien/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
davanstrien/website
Source for https://fullstackdeeplearning.com
davanstrien/Website-Classification
Trying to classify web archives using metadata...