Email-Thon is a simple library to do "all" things IR in regards to eml, msg, and live mailbox collection. The project is currently actively under development so just assume everything is broken.
Currently we only supporting reading .eml (text) based message exports. The immediate task will be to expand this to include .msg files. Each message will be represented as a dataclass with various attributes for processing.
To Complete
- Add doc strings
- Add logging module
- JSON repre for email data
- add support for .msg files
- Expand save to folder_name options
- Metadata file within each folder with summary of items parsed
- migrate to dataclass to represent a parsed email
Foward Looking
- Create CLI version
- Docker Ready
- Report generator PDF|HTML based on export folder
- add support to connect to mailboxes (protocol base support|outlook|gmail)
- Create Module for AI interaction (Context, Name Recognition)
Considerations
- Should we capture duplicate messages?
Complete
- Create simple lib with modular expansions
- Class implementation
- Create ReadMe
- Read all messages in directory
- IOC Extractor (URLS|Attachments)
- Attachment Extractor
- Extract only message (from,to,subject,date) within Body to assist with Context
- Export to folders tracked by subject
- class to represent a parsed email