/plaid2text

Python Scripts to export Plaid transactions and transform them into Ledger or Beancount syntax formatted files.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Plaid2Text Documentation

Table of Contents

Synopsis

The purpose of this python script is to bring those of us who chose to use command line accounting, some of the benefits of more conventional accounting programs, namely the ability to pull our transaction information from supported institutions via automated means and format them into our preferred text syntax. Currently, this program supports Ledger and Beancount syntax exports.

To download transactions, we use Plaid. This program will help you setup your accounts and download transactions from the Plaid API. I have tried to make this as simple as possible to setup.

Also, once downloaded, your transactions will be stored in a Mongo database. It is actually the transactions pulled from the database that we run though our syntax renderers. This is required to help keep track of which transaction we have already processed (as well as have records to reconstruct our files should the need arise).

The main inspiration for the workings of the export part of the script came from the excellent icsv2ledger. I borrowed heavily from Quentin’s excellent script in making this tool, and in some places I have shamelessly stolen his code altogether.

Contributions

Feedback and contributions are encouraged. I hope that this will pick up some traction in the community, and in the end we can all have a rock-solid program to help us with our accounting.

Requirements

  • Python => 3.5
    • PyMango => 0.1.1
    • prompt_toolkit => 0.57
    • plaid-python-legacy => 1.3.0
  • ledger-cli => 3 (if using ledger syntax)
  • Beancount => 2.0 (if using beancount syntax)
  • MongoDB => 3.2.3

I have only tested this on Linux. I have no desire to run this on Windows, but feel free to give it a shot, it may work. The same goes for Mac, although sometime in the future I may test on OSX.

Note: python dependencies must be installed prior to running the script. All can be installed from pip or of course whatever means you wish to use.

Installation

This program is setup as a python package, and can be installed using your preferred tools, but this document will only cover using pip.

First, clone the git repo: git clone git@github.com:madhat2r/plaid2text.git

or if preferring HTTPS: git clone https://github.com/madhat2r/plaid2text.git

Then change into folder: cd plaid2text

Then use pip to install: pip install .

If you plan on making modifications to the program, then you may want to install it as editable. (this is my preferred way) pip install -e .

Plaid

Plaid is an API used in building web-based financial applications. It provides access to our transactions at a number of institutions. To get started with this program, you must sign up for a plaid account. Once you have done that Plaid will issue you some developer keys to use with their API.

The keys we are interested in are:

client_id
this is your developer ID
secret
this is your authentication token

Once you have obtained your keys then use plaid2text to create your configuration file and save your keys into it. You can do this by simply invoking plaid2text without arguments. plaid2text will prompt you for your keys and store them in your config file.

A note about Plaid. Plaid is a paid service, but developers have access to the developer API without paying. The developer API has all the features of the production API. I have been using this for a few months now on my 6 accounts and everything is still working just fine. I did contact them about what the cost would be (and told them my use case), and was informed that a paid version comes to 0.25 USD per account, per month. That is still a heck-of-a-better deal than Quickbooks online in my opinion. I can get my 6 accounts for 1.50 USD per month, but like I mentioned, I have yet run into any caps on my developer account, so that may be all I ever use.

Plaid2Text

In order to use plaid2text you must have already followed the instructions in the Plaid section. Once you have your initial config in place, then let’s get started in creating your first account.

Creating a Plaid Account

In order to get transactions from Plaid, you must create an account. In order to create an account, you must authenticate yourself to your institution via your username and password, and also most institutions require some form of multi factor authentication, usually in the form of security questions, or codes sent to registered phone/email for the account. Using plaid2text this is easy to accomplish.

First issue the following command:

plaid2text account_nickname --create-account

As an example let’s say I am setting up an account for my personal checking, and the institution I use is Chase. I might use this command to create it.

plaid2text chase_checking --create-account

This would allow me to setup and account with the nickname chase_checking. You will use the nickname for downloading and parsing transactions, so make it something meaningful for you.

Once you have issued the above command plaid2text will present you with a list of institutions from which to choose.

What bank is this account going to be for? 
1   Bank of America
2   Chase
3   Wells Fargo
4   Citi
5   US Bank
6   USAA
7   American Express
8   Navy Federal Credit Union
9   SunTrust

Enter Number [q to quit]:

In our example we will choose 2 and hit enter. This will take us to the authorization screen, and will prompt for your credentials and follow up with the multi factor authentication (if required). If you enter everything correctly you will have setup your first account in Plaid. The program will then prompt you for some default config options. Config options are covered in the proceeding sections.

Note: wait at least 15 minutes before the first download of your transactions, this give Plaid time to collect the information from your institution. Plaid says it will have them within 240 seconds, but I think it’s better to give it time.

Also, different institutions keep your historical data for different lengths of time. You can check the availability here: Data Availability

Arguments Summary

plaid_account: (mandantory) this is the nickname you assigned when creating account
outfile: output filename or stdout in your chosen snytax (ledger,beancount)

Note: the outfile will be overwritten each time this is run so be careful that you do not erase your current journal file, or any other file of importance.

Options Summary

A lot of these options also have an equivalent setting in the config file (~/.config/plaid2text/config). Where this happens, the config file settings will be underscored versions of the command line long options: --mappings-file would become mappings_file.

Also, note that when there are both config setting and command line options, the command line options take precedence over config file settings.

--accounts-file FILE  file which holds a list of account names (LEDGER ONLY)
                      (default : ~/.config/plaid2text/accounts)
--all-transactions    pull all transactions even those who have been
                      previously marked as processed (default: False)
--clear-screen, -C    clear screen for every transaction (default: False)
--cleared-character {*,!}
                      character to clear a transaction (default: *)
--create-account      Create a new Plaid account using the plaid-account
                      argument as the new nickname (Example: chase_savings)
--currency STR        the currency of amounts (default: USD )
--default-expense STR
                      expense account used as default destination (default:
                      Expenses:Unknown)
--download-transactions, -d
                      download transactions into Mongo for given plaid
                      account
--from-date STR       specify a the starting date for transactions to be
                      pulled; use in conjunction with --to-date to specify
                      rangeDate format: YYYY-MM-DD
--headers-file FILE   file which contains contents to be written to the top
                      of the output file (default: ~/.config/plaid2text/headers)
--journal-file FILE, -j FILE
                      journal file where to read payees/accounts Tip: you
                      can use includes to pull in your other journal files
                      (default journal file: ~/.config/plaid2text/journal)
--mapping-file FILE   file which holds the mappings (default: ~/.config/plaid2text/mapping)
--mongo-db STR        The name of the Mongo database (default: plaid2text)
--mongo-db-uri STR    The URI for your MongoDB in the MongoDB URI format
                        (default: mongodb://localhost:27017)
--no-mark-processed, -n
                      Do not mark pulled transactions. When given, the
                      pulled transactions will still be listed as new
                      transactions upon the next run. (default: False)
--output-date-format STR
                      date format for output file (default: YYYY/MM/DD)
--output-format {beancount,ledger}, -o {beancount,ledger}
                      what format to use for the output file. (default
                      format: beancount)
--posting-account STR, -a STR
                      posting account used as source (default: Assets:Bank:Checking)
--quiet, -q           do not prompt if account can be deduced from mappings
                      (default: False)
--tags, -t            prompt for transaction tags (default: False)
--template-file FILE  file which holds the template (default: ~/.config/plaid2text/template)
--to-date STR         specify the ending date for transactions to be pulled;
                      use in conjunction with --from-date to specify
                      rangeDate format: YYYY-MM-DD
-h, --help            show this help message and exit

Options

--accounts-file is a file that you can store predefined account definitions for Ledger in the form of account Expenses:Unknown. This file is parsed for the account names and all lines that do not start with account will be ignored.

This is LEDGER specific setting.

--all-transactions will pull all transactions regardless if they are marked as already pulled. By default only transactions that have not been pulled to text are returned.

--clear-screen, -C clears the screen before every transaction prompt. Default is False.

--cleared-character {*,!} is the character mark a transactions as cleared or not. Default is *

--create-account is used to create a new account. See creating account section above for more.

--currency STR is the currency used for transactions. Default is USD.

--default-expense STR is the default account for which to post transactions to. Default Expenses:Unknown

--download-transactions, -d fetches new transactions from Plaid into Mongo for given account.

Use: plaid2text acct_nickname -d

--from-date STR specify a the starting date for transactions to be pulled.

Use in conjunction with --to-date to specify range

Date format: YYYY-MM-DD or YYYY/MM/DD

--headers-file FILE file which contains contents to be written to the top of the output file. For example, I store my beancount files as OrgMode files, so I have my headers file setup to insert instructions at the top for Emacs, to help ease my editing of them once they are exported to text. And also I include my main beancount file which has all my accounts listed, this also allows for easy running of bean-check to verify the newly exported file.

;; -*- mode: org; mode: beancount; -*-  
include "/path/to/somewhere/main.beancount"

Default: ~/.config/plaid2text/headers

--journal-file FILE, -j FILE journal file where to read payees/accounts. This could be your main ledger file or your main beancount file.

Tip: you can use includes to pull in your other journal files

Default journal file: ~/.config/plaid2text/journal

--mapping-file FILE file which holds the mappings for matching transactions to accounts/payees as well as some default tags, if you want.

You can have a separate mappings file per account.

default: ~/.config/plaid2text/mapping

--mongo-db STR name of the Mongo database that stores downloaded transactions.

Default: plaid2text

--mongo-db-uri STR The URI for your MongoDB in the MongoDB URI format

Default: mongodb://localhost:27017

--no-mark-processed, -n will not mark pulled transactions as pulled. When passed, the pulled transactions will still be listed as new transactions upon the next run.

Default: False

--output-date-format STR date format for output file

Default: YYYY/MM/DD

--output-format {beancount,ledger}, -o {beancount,ledger} what syntax to use for the output file.

Default output format: beancount

--posting-account STR, -a STR posting account used as source

Default: Assets:Bank:Checking

--quiet, -q do not prompt if account can be deduced from mappings

Default: False

--tags, -t causes the program to prompt for transaction tags

Default: False

--template-file FILE file which holds the text template used in the output file for formatting transactions.

Default: ~/.config/plaid2text/template

--to-date STR specify the ending date for transactions to be pulled.

use in conjunction with --from-date to specify range

Date format: YYYY-MM-DD or YYYY/MM/DD

Example Uses

The following will set up a new account with nickname chase_checking

plaid2text chase_checking --create-account

The following will download all new transactions for the account chase_checking.

NOTE: when downloading for the first time, be sure to wait at least 15min after setting up the account. This gives Plaid time to pull your transactions from the institution.

plaid2text chase_checking --downlad-transactions

The following will pull all new transactions for account chase_checking and output them to /tmp/onetime.ldg Ledger syntax after prompting you for the correct information for every transaction and marking all pulled transaction in the database as pulled.

plaid2text chase_checking /tmp/onetime.ldg --output-format ledger

The following will pull all transactions starting from the given date for the chase_checking account and will not mark them as pulled in the database, and will output beancount syntax to stdout.

plaid2text chase_checking --all-transactions --from-date 2015/04/15 --no-mark-processed

Processing a Transaction

When you start processing transactions, you will be presented with several prompts related to the current transaction. These prompts will be to get the associated account, the payee, and optionally tags. If you have a mappings file, provided a journal file, or have already processed a few transactions, then TAB completion is available at all these prompts.

During your first run, when your mappings file has not yet been established, you will have to manually (via prompts) establish the correct accounts and payees. But once you have done so, your mappings file will have the correct information for transactions in the future, and given that most of us are creatures of habit and make purchases from the same places, you will only occasionally have to account for new entries.

Now let’s walk through a transaction for you can get an idea of what to expect. Keeping with out sample account chase_checking, we will pull the latest transactions, and also prompt for tags (--tags) and suppress prompting for known transactions via our mappings (--quiet), we will also be using beancount output syntax (--output-format).

plaid2text chase_checking /tmp/onetime.bnc --quiet --tags --output-format beancount

When the above command is run, you will be presented with a prompt for the first non-matched transaction. The first prompt is for the payee. You will notice that the default answers are in [], so if you just hit enter, that will be the value. When looking at the transaction prompt, you will see it starts with a date followed by the name that Plaid assigns this transaction, in this case Plaid got it correct, this will not always be the case. The next area shows the amount of the transaction.

img/netflix_payee.png

Following the payee prompt is the “Account” prompt. Enter the correct associated account, then hit enter.

img/netflix_account.png

Then we are prompted for tags (because we passed --tags). Tags work a little differently, you will be prompted over and over for tags until you just hit enter without typing another value. If you make a mistake in entering your tag, you may prefix the tag with - (minus) to remove it. For instance say you accidentally typed mvoie and hit enter, when the prompt comes back you see your mistake in the default area and want to correct it. So now you type -mvoie and hit enter, and you will notice that the tag has been removed.

img/netflix_tags.png

When you hit enter the final time on tags, the program will move on to the next transaction needing your input.

Again, all of these prompts use TAB completion, and the more information you give the program, via your config files, the better the completion becomes.

Configuration Files

Main Configuration File

This is an example config file that has an account setup that is nicknamed chase_checking. You will notice some settings that are obfuscated with xxx, these are created when setting up accounts, and are not entered manually.

[DEFAULT]
posting_account = Assets:Bank:Checking
default_expense = Expenses:Unknown
encoding = utf-8
currency = USD
mongo_db = plaid2text
mongo_db_uri: mongodb://localhost:27017
quiet = False
tags = False
output_date_format = %%Y/%%m/%%d
clear_screen = False
cleared_character = *
output_format = beancount

[PLAID]
client_id = xxxxxxxa66710877xxxxxxxx
secret = xxxxxxxxx8c9a0cd27xxxxxxxxxxxx

[chase_checking]
access_token = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxdf0fbb4cc4f5323f7ccd6
account = xxxxxxxxxxxxxPzJ3nAkFxxxxxxxxxxxxxxxx
currency = USD
posting_account = Assets:Bank:Chase:Checking
mapping_file = ~/.config/plaid2text/chase_checking/mapping_bc
headers_file = ~/.config/plaid2text/chase_checking/headers_bc
accounts_file = ~/somewhere/main.beancount
journal_file = ~/somewhere/beancount/main.beancount
template_file = ~/.config/plaid2text/chase_checking/template_bc

Template File

The template file is what transforms your transactions into the desired text based accounting syntax. You have access to all the fields that plaid returns to use in your templates. But be aware that not all fields are returned with every transaction, and you might have to modify the source to handle this, should you choose to use them in your template. Below is a list of all fields available. The A column indicates if field is always available.

FieldTypesA
_accountStringy
_idStringy
amountNumbery
nameStringy
dateDatey
metaObjecty
meta.locationObjecty
pendingBooleany
scoreObjecty
score.locationObjecty
score.nameNumbery
typeObjecty
type.primaryStringy
meta.location.stateStringn
score.location.stateNumbern
categoryArrayn
category_idStringn
meta.location.cityStringn
score.location.cityNumbern
meta.location.coordinatesObjectn
meta.location.coordinates.latNumbern
meta.location.coordinates.lonNumbern
score.location.addressNumbern
score.location.zipNumbern
meta.location.addressStringn
meta.location.zipStringn
meta.location.store_numberStringn
meta.payment_processorStringn
meta.ppd_idStringn
_pendingTransactionStringn
meta.reference_numberStringn
meta.payeeStringn
meta.payment_methodStringn

In addition to the above fields plaid2text also provides the following:

Fieldtype
posting_accountString
associated_accountsString
payeeString
tagsString

Brief Field Description (only of main use fields)

_account
the Plaid account ID
_id
the Plaid transaction ID, Also the MongoDB _id
name
the Plaid name for the transaction. (i.e. Best Buy)
amount
the amount of debit/credit. This is a signed number.
date
the date the transaction occurred
posting_account
the account transaction are posted to
associated_account
the expense or other account attributed to the transaction
payee
the payee for the transaction
tags
the given tags for the transaction in a string
beancount
format: ‘#tag1 #tag2 #etc’
ledger
format: ‘:tag1:tag2:etc:’

NOTE: The tags field is prefixed with a space, when tags are present, this allows us to loose the trailing space that would otherwise exist in situations where there were no tags, and the configured template supports them.

Example of trailing space template:

{transaction_date} {cleared_character} "{payee}" "" {tags}

Using the above template would result in a trailing space when no tags are present.

Example proper template:

{transaction_date} {cleared_character} "{payee}" ""{tags}

This template will add prefix the tags with a space only if they are present, otherwise it returns an empty string. This line will not have a trailing space.

Defaults

beancount

{transaction_date} {cleared_character} "{payee}" ""{tags}
    plaid_name: "{name}"
    plaid_id: "{_id}"
    {associated_account:<60}   {amount} {currency}
    {posting_account}

ledger

{transaction_date} {cleared_character} {payee}{tags}
    ; plaid_name: {name}
    ; _id: {_id}
    {associated_account:<60}   {currency} {amount}
    {posting_account:<60}

Headers File

The headers file is used to add some text to the top of the output file. This can be anything you like. I use mine for adding some header info for Emacs to read for it sets the correct mode for me when I edit the file.

I also use the include directive to pull in my main file, to aide in running bean-check.

Mappings File

The mappings file is simply a CSV formatted file, that contains four fields. When exporting transactions, this file will try to establish the proper accounts and payees for each transaction based on the fields in the file. It also handles adding some default tags.

This file is created for you, if you do not have one defined in the settings. Also, it is appended to every time you are exporting transactions with the new matches, that way next time you export you will not have to enter the information again if you use --quiet.

Fields

  1. text to match against the Plaid name field. This can be either plain text or a regex. If the field starts and ends with / it is assumed to be a regex. Note: all the regexes will be matched case insensitive.
  2. the name you wish to use for the payee
  3. the associated expense or other account (i.e. Expenses:Unknown)
  4. tags to be used for this transaction. This should be in the form of a string. For ledger the format would be: :tag1:tag2:etc: and for beancount: #tag1 #tag2 #etc

Important Point

The matching algorithm will always use the latest match when processing entries. So if for example you have a regex setup that matches //best buy// at the top of the mappings file and another that has //buy// later in the file, the last match wins.

Sample Mappings File

Some of the listings will contain ledger formatted tags while other will be beancount, you of course will only have the type that you need, do not mix them.

/Amazon/,"Amazon",Expenses:Unknown:Amazon, #sort-out
/PAYPAL INST XFER/,"PayPal",Expenses:Unknown:PayPal, :sort-out:
/.*NETFLIX.*/,"Netflix",Expenses:Bills:Subscriptions:Netflix
/.*DROPBOX.*/,"Dropbox",Expenses:Bills:Subscriptions:Dropbox
/Amazon Video/,"Amazon Video",Expenses:Entertainment:Movies
The Doughnut Palace,"The Doughnut Palace",Expenses:Food:FastFood
54th Street,"54th Street",Expenses:Food:Restaurant
BJ'S RESTAURANTS,"BJ's Restaurant",Expenses:Food:Restaurant

Also notice the sorting of the entries so that Amazon Video gets categorized properly. If it were above the Amazon entry, it would use the setting from there instead, as the last entry always wins.

Workflow

In this section I will just describe my basic workflow to demonstrate how I use this tool. Going forwards assumes you have already established your plaid setup as well as at least one account. I will continue to demonstrate with the example account chase_checking to keep things consistent.

Download New Transactions

When I get ready to work on my books, I start by downloading the newest transactions for the account I am working on.

plaid2text chase_checking -d

This will download all the newest transactions from my accounts into the MongoDB.

You can of course setup a cron job to do this nightly, but I find it fits into my workflow just doing it manually.

Export New Transactions

I export new transaction (all the ones that haven’t previously been pulled), into a temporary file, where I can do some manual checking and editing.

plaid2text chase_checking /tmp/onetime.beancount --quiet

Using the --quiet switch, the program will only prompt me for information on the transactions that it cannot deduce based on the mappings file. You can of course leave that switch off if you want to be able to change the defaults from the mapping file.

Also, if you want to do a test run, without marking the transactions as pulled use the --no-mark-pulled switch.

IMPORTANT I want to stress that the outfile is OVERWRITTEN or created every time this command is run. So be careful. :)

Copy Transactions

When I am satisfied that all is well with my temp file. I copy the new entries into my actual journal file.

DISCLAIMER

This should be considered *beta* version code. I have released it hoping that it will be of benefit to others in a similar situation as me. This version of the code is really hacked together and in need of serious refactoring, and will most likely contain bugs. I have had this working for myself for a few weeks, and have found it stable and usable. But I do caution you, to use at your own risk.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You can obtain a copy of the license here: GNU General Public License