danieljohnevans's Stars
LibraryOfCongress/newspaper-navigator
Dicklesworthstone/llama2_aided_tesseract
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections, complete with options for text validation and hallucination filtering.
ropensci/textreuse
Detect text reuse and document similarity
BirkbeckCTP/janeway
A web-based platform for publishing journals, preprints, conference proceedings, and books
inpho/topic-explorer
System for building, visualizing, and working with LDA topic models
dasmiq/passim
Detect and align similar passages
dell-research-harvard/effocr
A model(ing framework) for sample efficient OCR
PonteIneptique/YALTAi
You Actually Look Twice At it
glenrobson/iiif2annos
OCR a IIIF images in a manifest and generate annotations
Living-with-machines/nnanno
nnanno is a collection of tools that sample, annotate and apply computer vision to the Newspaper Navigator dataset
catseye/Guten-gutter
Strips boilerplate from Project Gutenberg text files
MARXdown/MARXdown.github.io
Inital build of digital edition of Capital Volume 1 using Ed. and hypothes.is
htrc/HTRC-WorksetToolkit
Python SDK for Data API and Solr API access
cmu-lib/archive_plugin
Janeway plugin for managing journal archiving.
hadro/info654fa20
Data and documents for Pratt INFO654-04 Information Technologies class, Fall 2020
BirkbeckCTP/pandoc_plugin
Plugin for janeway for automatic galley generation
AmerAntiquarian/Printers-File
ReadingTimeMachine/ocr_post_correction
DeepSouthDH/deepsouthdh.github.io
jawalsh/acsproj
kevincoakley/ansible-role-docker
sgotzler/megaText
Mapping the Television Mega-Text
SmithPapers/smithpapers.github.io
digitalhogarth/digitalhogarth.github.io
HumanitiesAnalytics/datasets
jwusteman/sru-search
SRU Search Web Component
kodingkoning/BookLabWoodType
sgotzler/demo_site_PH
demo site for the lesson on deploying digital reading editions
sgotzler/liteMegaText
Repo for the Megatext dB