WHaverals's Stars
kohler/hotcrp
HotCRP conference review software
EnzoFleur/style_embedding_evaluation
This repository allows to perform the evaluation of author embedding on a writing style axis.
arvindrajan92/DTrOCR
A PyTorch implementation of DTrOCR: Decoder-only Transformer for Optical Character Recognition
timothymiller/cloudflare-ddns
🎉🌩️ Dynamic DNS (DDNS) service based on Cloudflare! Access your home network remotely via a custom domain name without a static IP!
ant-louis/belgpt2
🇧🇪 BelGPT-2: the 1st GPT model pretrained in French.
ParisNeo/lollms-webui
Lord of Large Language Models Web User Interface
booknlp/booknlp
BookNLP, a natural language processing pipeline for books
SCUT-DLVCLab/GPT-4V_OCR
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
confident-ai/deepeval
The LLM Evaluation Framework
forTEXT/katkit_toolbox
Implementation of KatKit as presented at DH2024
melaniewalsh/responsible-datasets-in-context
A repository of datasets paired with rich documentation, data essays, and teaching resources
Yuliang-Liu/MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
eeditiones/eebo
Early English Books Demo for TEI Publisher 6
ljvmiranda921/prodigy-pdf-custom-recipe
Custom recipe and utilities for document processing
explosion/vscode-prodigy
🧬 A VS Code extension for annotating data with Prodigy
jzhang512/post-ocr-correction
Code and prompt templates for the "Post-OCR Correction with OpenAI’s GPT Models on Challenging English Prosody Texts" short-paper submission to DocEng 2024.
OpenITI/acdc_train
Automatic Collation for Diversifying Corpora
ReviewNB/treon
Easy to use test framework for Jupyter Notebooks
Mozilla-Ocho/llamafile
Distribute and run LLMs with a single file.
avjves/textreuse-blast
A software to detect text reuse with BLAST.
Instruction-Tuning-with-GPT-4/GPT-4-LLM
Instruction Tuning with GPT-4
ollama/ollama
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
princeton-vl/infinigen
Infinite Photorealistic Worlds using Procedural Generation
philschmid/document-ai-transformers
leondz/hatespeechdata
Catalog of abusive language data (PLoS 2020)
GateNLP/broad_twitter_corpus
The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016)
leondz/garak
the LLM vulnerability scanner
golang/go
The Go programming language
explosion/thinc-apple-ops
🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library
jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.