IRE Resources Work

This repository contains code and data involving IRE tipsheets, and in particular the extraction of metadata from those files to create an improved version of the Resources section of ire.org. Currently this work involves converting electronic PDFs into text and then running those text files through multiple LLMs for extraction purposes. Those LLMs include:

  • OpenAI
  • Claude
  • Claude 3 Sonnet
  • Mistral 7B
  • llama2
  • Palm (Google)
  • Gemini (Google)

The extraction is done in ire_parser.py, which uses the Python llm and ttok libraries. A separate script, ire_validator.py, evaluates the results in terms of JSON validity and data structure, producing results.csv Google Sheet version.