/invoice-compress

A simple script to download my invoice pdf file

Primary LanguageTypeScript

Only test under MacOS now.

Prepare

You should have brew installed. If not, you can download from here

Install isync

brew install isync

Install mu

brew install mu

Install gpg

Use gpg to encrypt email password.

brew install gnupg gnupg2

Create gpg password

  • create a file and write your email password in it. For example: echo my-password > ​~/pass
  • use gpg encrypt the file with password: gpg -c -o ​​~/pass.gpg ​~/pass
  • then config ​~/pass.gpg to .mbsyncrc ‘s PassCmd, see below

Config isync

  • create a folder, mkdir ~/Maildir
  • create a file, ​~/.mbsyncrc

fullill .mbsyncrc with your email config, for example:

# your email IMAP config, setup your account
IMAPAccount spike
Host outlook.office365.com
Port 993
User <your-email-address>
# your need to use gpg to decript your pass if you have encrypted
PassCmd "gpg -q --for-your-eyes-only --no-tty -d ~/pass.gpg"
# if you do not encrypt your pass, you can use this:
# Pass <your email password>
SSLType IMAPS
SSLVersions TLSv1.2

# Remote Store
# set `Far`(remote) store
IMAPStore remote
Account spike

# set `Near` (local) store
MaildirStore local
# where to save your email
# if folder not exist, you need to create it
Path ~/Maildir/spike
# where to save your inbox
Inbox ~/Maildir/spike/INBOX
SubFolders Verbatim

# Channels
Channel spike
# refer to IMAPStore name
Far :remote:
# refer to MaildirStore name
Near :local:
Patterns *
Expunge None
CopyArrivalDate yes
Sync All
Create Near
SyncState *

For more info, you can find here.

Download your email

mbsync -a

This may take a long time if you have lots of emails.

Use mu to index your email

if your emails has CJK characters,maybe you should set this before:

export XAPIAN_CJK_NGRAM=1

then

  • mu init
  • mu index

By default, mu will try to index your email under ~~/.Maildir~

Get started

git clone https://github.com/Spike-Leung/invoice-compress.git
cd invoice-compress
PUPPETEER_PRODUCT=firefox yarn
yarn extract

Specify date range

By default, this script will extract all invoice in current month.

But you can also specify a time range to extract. For example:

yarn extract 20220401...20220502

How it works

  • use isync to download all emails
  • use mu to index, find, and extract pdf in emails (in the script, mu find the email subject with “发票”, you can fork this repo to custom yours)
  • use mu to find invoice email without pdf, but have a download links
  • use mailParser parse the email, and use jsdom query the download links
  • then use http and https node module to download pdf from links
  • if there are links can not download directly, write a adapter which use puppeteer or something else to find the download links

Todo

  • [ ] calculate invoice sum