redstreet/beancount_reds_importers

Vanguard importer: consider using same timestamp to infer globbing together transactions.

Closed this issue · 2 comments

I tend to see vanguard transaction events where two dividends accumulate and sweep into a money market account.

i.e. let's say you have two dividend transactions:

<INCOME><INVTRAN>
<FITID>123888456<DTTRADE>20210301160000.000[-5:EST]<DTSETTLE>20210301160000.000[-5:EST]
<MEMO>DIVIDEND PAYMENTDIVIDEND PAYMENT</INVTRAN>
<SECID><UNIQUEID>92206C300<UNIQUEIDTYPE>CUSIP</SECID>
<INCOMETYPE>DIV<TOTAL>50.00<SUBACCTSEC>CASH<SUBACCTFUND>CASH</INCOME>
<INCOME><INVTRAN>
<FITID>123777456<DTTRADE>20210301160000.000[-5:EST]<DTSETTLE>20210301160000.000[-5:EST]
<MEMO>DIVIDEND PAYMENTDIVIDEND PAYMENT</INVTRAN>
<SECID><UNIQUEID>92206C821<UNIQUEIDTYPE>CUSIP</SECID>
<INCOMETYPE>DIV<TOTAL>100.00<SUBACCTSEC>CASH<SUBACCTFUND>CASH</INCOME>

That pool together to result in this purchase of a money-market asset:

<BUYTYPE>BUY</BUYMF><BUYMF><INVBUY><INVTRAN>
<FITID>123999456<DTTRADE>20210301160000.000[-5:EST]<DTSETTLE>20210301160000.000[-5:EST]
<MEMO>MONEY FUND PURCHASE</INVTRAN>
<SECID><UNIQUEID>922906300<UNIQUEIDTYPE>CUSIP</SECID>
<UNITS>150.00<UNITPRICE>1.0<TOTAL>-150.00<SUBACCTSEC>CASH<SUBACCTFUND>

Currently it results in:

2021-03-01 * "MONEY FUND PURCHASE" "[VMFXX] Vanguard Federal Money Market Fund"
  file_account: "Assets:Vanguard:Brokerage"
  Assets:Vanguard:Brokerage:VMFXX   150.00 VMFXX {1.0 USD}
  Assets:Vanguard:Brokerage:USD    -150.00 USD            

2021-03-01 * "DIVIDEND PAYMENTDIVIDEND PAYMENT" "[VSBSX] Vanguard Short-Term Treasury Index - Admiral Shares"
  Assets:Vanguard:Brokerage:USD               50.00 USD
  Income:Vanguard:Brokerage:VSBSX:Dividends  -50.00 USD

2021-03-01 * "DIVIDEND PAYMENTDIVIDEND PAYMENT" "[VLGSX] Vanguard Long-Term Treasury Index - Admiral Shares"
  Assets:Vanguard:Brokerage:USD               100.00 USD
  Income:Vanguard:Brokerage:VLGSX:Dividends  -100.00 USD

This is functional, but feels rather odd to have three transactions, and requires that there's an imaginary account of Assets:Vanguard:Brokerage:USD for everything to work. Is it possible for the importer to be a bit smart and glob together these events based on the fact that they all have the same timestamp (20210301160000.000)? the FITID's are also similar-ish (the first three and last three digits usually are all the same between these events as shown above). Ideally thinking the generated result should look like:

2021-03-01 * "MONEY FUND PURCHASE & DIVIDEND PAYMENTDIVIDEND PAYMENT & DIVIDEND PAYMENTDIVIDEND PAYMENT" "[VMFXX] Vanguard Federal Money Market Fund & [VSBSX] Vanguard Short-Term Treasury Index - Admiral Shares & [VLGSX] Vanguard Long-Term Treasury Index - Admiral Shares"
  file_account: "Assets:Vanguard:Brokerage"
  Assets:Vanguard:Brokerage:VMFXX   150.00 VMFXX {1.0 USD}
  Income:Vanguard:Brokerage:VSBSX:Dividends  -50.00 USD
  Income:Vanguard:Brokerage:VLGSX:Dividends  -100.00 USD

Feel free to close this idea if it sounds too risky/complicated, just trying to brainstorm anything that comes to mind.

Hello again! Good question, this is definitely something I considered, and for a while had code to handle it. However, I found that I didn't get anything out of doing this. In general, this falls under the category of drawing "higher level" inferences based on a set of rules. I found this to break rather easily because:

  • there are always cases one hasn't run into yet and are therefore not encoded as rules in the code
  • institutions make surprising changes occasionally, which breaks code
  • rules vary across institutions, making it a pain to maintain these rules

The question I'd go back to is: is there a true benefit to drawing these inferences? The source looks a bit better, but I rarely look at my source or even journal for investments (I do, for expenses); my view of my transactions is either through BQL or fava, and there, I'm looking at aggregates and queries (for investments), which are all agnostic to how the source looks in these cases.

That said, having a post_process() api which calls a user-function at the end of extract() would allow each user to to write a few lines of code to maintain their own "inference rules" such as this one, if so desired, and I'd be very open to creating that (should be simple).

PS: Same question for you for the same reasons: would you be okay if I moved your comment above as a comment thread to one of my articles in https://reds-rants.netlify.app/ ? Let me know.

Yes, thanks for answering! Definately realizing there's places where such optimizations can just create more problems. Fine with moving into a comment thread.