beancount/beangulp

how to extend precanned csv import extract method to add csv source to __source__ metadata

blais opened this issue · 3 comments

blais commented

Original report by Jeff Mondoux.


I have a custom importer to import my banks csv statements, this imported inherits from beancount csv importer in which I override the extract() method as such:

def extract(self, file, existing_entries=None):
    mapped_account = self.file_account(file)
    entries = super().extract(file, existing_entries)
    for entry in entries:
        entry.meta['__source__']='source'
    return entries

What I can’t figure out with my limited python abilities is how can I add the raw csv line to the __source__ metadata field so that it can be displayed by fava import gui. I want to avoid rolling my own csv importer entirely as the generic csv importer provided by beancount does what I need for the most part. I know I can reread the csv file a second time to append the data, but is this the best or only way?

My current workaround (which involves having my own copy of the generic csv importer) is patching the signature of the categorize() method so it accepts two parameters: transaction (as it does right now, with the generic information already filled in) and row (the current row from the csv-file, so you can parse extra columns yourself there).

I will try to create a pullrequest somewhere this week!

Hey folks - I'm considering extending this so that the CSV importer passes both the transaction's row and the CSV file's header, on each categorize() call. It seems a shame for the CSV importer to be doing all the discovery work about the file's structure, and not passing part of the benefit over to the Categorizer! And, as well, not having this info reduces the utility of more generic, cross-account Categorizers (or so I'm finding).

Before I work on this:

  • have I missed something obvious that removes the need for this change?
  • how firm are we on the row object being passed to categorize() being a list; or could I change its prototype to something more stringly-indexable? Or do I need to subclass list to ensure other folks' existing Categorizers don't break? I guess this could only affect Categorizers written or updated since beancount/beancount#483 was merged, 2 months ago. If it were my call, I /think/ I'd not be that concerned with the backwards compat, here. But that's just my 2 cents :-)

The CSV importer stores the origin file path and line number in the metadata 'filename' and 'lineno' fields (which are not serialized when the entries are printed). You can simply post-process the entries to move the information from these metadata entries to where fava expects them.