Pinned Repositories
foiarchive-search
Streamlit for FOIArchive search GUI
talk-writing-recursive-queries
Materials for a Recursive SQL presentation originally given at PostgresConf US '19
optimal-data-loads
Materials for PGCONF NYC 2022 presentation: Tips and techniques to optimize the 'L' component of your PostgreSQL ETL and ELT processes and make them easy to maintain. Includes real-world examples with timings.
pdf2mbox
a command-line utility and Python package for converting PDF emails to MBOX format
archiving-digital-records
Course materials from the 2023 & 2024 Archiving Digital Records track of the Archives as Data Summer Institute.
xmpdf
A Python module for extracting emails from a PDF.
add-pdfmeta
Example that shows how to add metadata to a pdf
piir-poc-dp
PII proof of concept with DataProfiler
advent-of-sql-2024
My solutions to the 2024 Advent of SQL
article-hugo-website
A no-code, no-software and no-cost solution to publishing sophisticated web sites managed by non-technical people.
benjlis's Repositories
benjlis/foiarchive-search
Streamlit for FOIArchive search GUI
benjlis/advent-of-sql-2024
My solutions to the 2024 Advent of SQL
benjlis/hl-redirect
benjlis/GitHubActionsTutorial-USRSE24
Content for US-RSE'24 Tutorial "GitHub Actions for Scientific Data Workflows"
benjlis/subore
Subject and body search via regular expression across FOIArchive corpora.
benjlis/sdpt
SQL data profiling toolkit
benjlis/covid19-gui
GUI for History Lab's COVID-19 collection
benjlis/eabcc-presentation
Slides and materials for the talk "Creating Email Archives from PDFs – The COVID-19 Corpus" delivered at the EABCC Email Archiving Symposium in June '23
benjlis/muckrock-client
Simple Python client for MuckRock API
benjlis/add-pdfmeta
Example that shows how to add metadata to a pdf
benjlis/nbdev-hello-world
Hello World with nbdev
benjlis/test-eval
Schema and code for the History Lab test evaluation framework
benjlis/lsal
Label Studio Annotation Load
benjlis/pdb-gui
A prototype query interface to the FOIArchive's PDB corpus.
benjlis/piir-poc-dp
PII proof of concept with DataProfiler
benjlis/ddmd
Generates a Markdown table description based on SQL data dictionary information
benjlis/hldoc
New data structure for representing History Lab docs
benjlis/optimal-data-loads
Materials for PGCONF NYC 2022 presentation: Tips and techniques to optimize the 'L' component of your PostgreSQL ETL and ELT processes and make them easy to maintain. Includes real-world examples with timings.
benjlis/tqs
Code for generating a text quality score for a text file, intended to measure OCRed text quality. It's a fork of martinamaximovich/improvingOCR.
benjlis/csv2pg
Utility for loading a CSV file into PostgreSQL.
benjlis/pim
Personal information manager tools
benjlis/pg-pandas-profiling
A Python package that executes pandas profiling on the results of a SQL query run against a PostgreSQL database.
benjlis/foiarchive-search-prototype
Ideas for a new FOIArchive search interface
benjlis/article-hugo-website
A no-code, no-software and no-cost solution to publishing sophisticated web sites managed by non-technical people.
benjlis/pyt
explore PyTesseract and related packages
benjlis/data-quality
Scripts for improving data quality
benjlis/pdf2db-em
Extracts email messages from a PDF and stores them in a FOIArchive email database schema.
benjlis/danalysis
A PostgreSQL extension that performs basic descriptive statistical analysis of the columns in a table.
benjlis/explain-explain
Notes and materials for a talk describing EXPLAIN PLAN
benjlis/pomodoro
Timer to support pomodoros.