This is a very quick and dirty gem that swallows downloaded PDF files from HSBC (UK) and parses them into a Dry::Struct containing details of the statement + its transactions.
It exists soley because HSBC doesn’t seem to offer any way of exporting old statements as anything other than PDFs, which makes it a pain in the backside to import anything into any kind of finance packages. You probably shouldn’t use it (see warnings below)
Using bundler on the command line:
$ bundle add hsbc_pdf_statement_parser
$ bundle
This gem exposes one method: parse
, which takes the path to a statement PDF and returns a Dry::Struct
representation.
require 'hsbc_pdf_statement_parser'
parsed = HsbcPdfStatementParser.parse( 'path/to/statement.pdf' )
parsed.transactions.each do |tx|
printf(
"[%s] {%-3s} %-40s %7.02f | %7.02f\n",
tx.date,
tx.type,
tx.details.lines.first.strip,
tx.change,
tx.balance
)
end
account_holder
: the name of the account holder (String)sortcode
: the sortcode shown on the statement (String)account_number
: the account number shown on the statement (String)sheets
: the sheets used in the statement (Range[Int])date_range
: the date range shown on the first page of the statement (Range[Date])opening_balance
: the opening balance of the statement (Decimal)closing_balance
: the closing balance of the statement (Decimal)payments_in
: the total of all transactions into the account (Decimal)payments_out
: the total of all transactions out of the account (Decimal)
Note: payments_in
and payments_out
are those shown on the first page of the statement and they are not calculated from- or checked against the parsed transactions.
Also note that sheet numbers are not guaranteed to be unique. Not sure why this is the case, but I have a few statements where the the last sheet of one statement and the first of another have the same sheet number.
date
: the date of the transaction (Date)type
: a string representation of the type of the transaction (eg.DD
for a direct debit,VIS
for VISA, etc) (String)details
: a text description of the transaction, which may span multiple lines (String)paid_in
: the amount paid in, if appropriate (Decimal, nullable)paid_out
: the amount paid out, if appropriate (Decimal, nullable)balance
: the balance of the account after the transaction (Decimal)change
: the calculated change to the balance of the account: negative for debits, positive for credits (Decimal)
Note: unlike in V1, balance
is now always present and is calculated as a running total based on the opening balance of the statement. Where the statement shows a running balance after transactions (seems to be once a day), this is checked and the parser will raise an error if any discrepancy is found.
This gem has been thrown together for my own needs, and is offered to the world just in case someone else might want to play around with it. It seems to work pretty well with statements from my Advance account here in the UK, and may also work with other flavours of accounts from elsewhere in the world, but comes with absolutely zero guarantees or assurances.
That is to say: it seems to work OK for mucking around, but I’d recommend not using it for anything mission-critical, or in a situation that might lead you or others into making any kind of financial decisions. Any dumb financial decisions made are entirely on you =)
I have plenty, sadly the only way of properly testing this code is by parsing real bank statements, and I’m not about to commit any of those to github. Sorry!
For various reasons I’ve not worried too much about trying to maintain backward compatibility: migration should be relatively minimal, though:
- invocation:
HsbcPdfStatementParser::Parser.new(…)
becomesHsbcPdfStatementParser.parse(…)
- when using parsed transactions,
in
andout
are nowpaid_in
andpaid_out
respectively - any use of
fetch(…)
on parsed transactions will need to be replaced with bare function calls (hash accessors—ietx[:date]
—will continue to work)
### Nonbreaking changes
Aside from the new properties added to the main Statement type, the biggest difference is that a statement’s balance
property is now always specified, whereas it was only specified once a day in V1.x
Share and enjoy