Categorizing ambiguous transactions
MichaelVessia opened this issue · 15 comments
I am now automatically generating .journal
files using the provided scripts. However, some transactions are not able to be categorized via rules. For example:
2019/07/15 Check Paid #1020 : Checking
assets:bank:checking -255.00 USD
expenses:unknown 255.00 USD
In this transaction I was able to assign the appropriate account1
, but there is no context given for what account2
is and thus I cannot use a rule to automatically categorize it.
What is the intended workflow for a situation like this? Normally I would just manually add the account2
and a comment, but obviously I will not be doing that in the auto-generated .journal
file. Do you remove the entry from the in/[filename].csv
and just handle this transaction manually? I fear that would cause problems if I reimport my csv at a later date.
Thanks for the help.
After perusing the wiki some more, I am thinking I can create an intermediate account as per this section
I would create a rule such that my main file generates the transaction like this:
2019/07/15 Check Paid #1020 : Checking
assets:bank:checking -255.00 USD
expenses:checking 255.00 USD
In a separate journal file, I would maintain a list of transactions like this:
2019/07/15 Check Paid #1020 : Checking
expenses:checking -255.00 USD ; cancel out the amount from the other posting
expenses:actual category 255.00 USD ; properly categorize the transaction
This separate journal would be included in my main journal and would not be enrolled in the export.hs
pipeline.
I will try implementing this later today and if it works I will close this issue. If you have any improvements or advice on my idea feel free to comment as well.
Thanks!
Matching in CSV conversion rules is done on the whole line, so you could create a rule that matches every little bit you have.
For example, even if you have just the date and amount (and lets assume that account1 is implied by the filename), you would still be able to create a rule that says "these 255 usd on 2019-07-15 were this transaction".
So, like this section says, you would just add a line similar to this one to your rule file, and it will be almost as easy as writing comment directly in the file:
2017/07/15.*255.00|account2|comment
For those check transactions with no information, i have to manually go to my checking account and look at the check image for the details. I am going to go with the manually maintained file for those. Otherwise I can make use of the csv rules like you said so I am going to close this issue.
Thanks again
Hello! Thank you for sharing this useful hledger set-up with the world :)
I am faced with a similar issue where I have many transactions from a financial institution which require manual categorization.
The solution presented by @adept (create hyper-specific rules inside the your rules file which only ever apply to a single transaction) works well. Unfortunately, it causes the rules file for an institution to grow larger with each passing year as it fills up with hyper-specific rules which are not relevant to subsequent years.
Is it possible to add additional rules files which are applied only to a single csv (instead of all csvs inside an institution directory)? Since you can't conditionally include
rules files inside other rules files, I'm not sure what would be the best way to do this. Creating separate sub-directories inside import/lloyds/
for each csv seems overly complex (though it seems like an appropriate use of the csv include rule).
The solution presented by @MichaelVessia (maintain a separate journal file with the ambiguous transactions, which is then included in the year-level journal) does not suffer from this problem, but it requires the extra burden of maintaining a separate directory of manually maintained journals. I think I will use this solution for now, but I would be grateful to hear thoughts on this problem.
Thank you again! Your tool will certainly save me hours in the coming years.
Well, you certainly could have separate rules files for separate input files.
The easiest way would be to have the basename of the rule to be the same as the name of the input file, and then in the csv2journal
script for your source do something like this:
hledger print --rules-file ./rules/$(basename "$1") -f "$1"
(which assumes that your rules riles are in the ./rules subdirectory, next to the ./csv,, ./journal etc).
You would also need to declare the dependency on the rules file :
extraDeps file
| "//your-source//*.journal" ?== file =
let Just [_, basename] = filePattern "**/*.journal" file
in
["./rules/" ++ basename ++ ".rules"]
And this, i think, should be enough.
@adept Brilliant! This solution is much more elegant than what I'd planned on doing. Two things:
hledger print --rules-file ./rules/$(basename "$1") -f "$1"
passes files like your-source/rules/*.csv
to hledger.
hledger print --rules-file ./rules/$(basename "$1" .csv)".rules" -f "$1"
passes files like your-source/rules/*.rules
.
I set csv2journal to the latter, since that matches the extraDeps function (and it makes sense).
Dependency checking doesn't seem to be working. I have saved the rules files which are not specific to a single csv as your-source/rules/*.rules
, and when those files change, they do not cause new reports to be generated when export.hs runs.
I tried replacing let Just [_, basename] = filePattern "**/*.journal" file
with let basename = takeBaseName file
, but that didn't make a difference.
Thank you again! Besides the dependency checking, it works perfectly.
Yes, i totally forgot about suffix stripping, thanks.
Re dependency checking: when you change build rules, files listed as dependencies will be inspected and their particulars (modification time, for example) saved only the next time target is built. So when you change dependency computation code, you probably want to run ./export.sh --rebuilt
to force the rebuild of all targets. See if dependencies would be properly tracked after that?
You're welcome!
The problem was that I added the dependencies wrong. My extraDeps function looks like this:
extraDeps file
| "//your-source//*.journal" ?== file = ["your-source.rules", "generated.rules"]
| "//your-source//*.journal" ?== file =
let Just [_, basename] = filePattern "**/*.journal" file
in
["./rules/" ++ basename ++ ".rules"]
| otherwise = []
I kept "//your-source//*.journal" ?== file = ["your-source.rules", "generated.rules"]
because inside each your-source/*.rules
, there is a line which include
s the shared rules in your-source.rules
. The guard stops at the first match, so of course it doesn't match "./rules/*.rules"
.
What is the correct way to concatenate those two arrays? (I hope the intended behavior is clear from the incorrect code above)
Thank you for your help!
You could just place your code into a single clause, unless i am missing something:
extraDeps file
| "//your-source//*.journal" ?== file =
let Just [_, basename] = filePattern "**/*.journal" file
in
["./rules/" ++ basename ++ ".rules", "your-source.rules", "generated.rules"]
| otherwise = []
For debugging, you would find import Debug.Trace
and functions trace
and traceShow
useful
That works perfectly! I wasn't sure about the syntax.
Thanks for the debugging tips! Your export.hs script is the first Haskell code I've played with. I like it very much - thanks for helping me out.
If it sounds like a good idea, I would be happy to draft up a wiki page which describes how to include rules files which only affect a single csv (and an explanation of why one might want to do that). I should be able to do that in the next month, might not be right away.
Happy to help! That wiki page sounds like a lovely idea, pleas do!
Great! Would it be appropriate to put it at the end (it would become number 12)? I think it might also flow well to insert it after "Maintaining CSV rules" (it would become number 7).
Actually, it might be better to put it at the end, so that it's not necessary to include that code in the subsequent steps.
I think putting it at the end works quite well. It is not too difficult to renumber everything, but numbering is quite arbitrary anyway, so i would not bother too much trying to put this new section close to "Maintaining CSV rules"
Sounds good! I'll get back to you soon with PR :)