Categorizing ambiguous transactions

Question

Categorizing ambiguous transactions

MichaelVessia opened this issue 5 years ago · 15 comments

I am now automatically generating .journal files using the provided scripts. However, some transactions are not able to be categorized via rules. For example:

2019/07/15 Check Paid #1020 : Checking
    assets:bank:checking                  -255.00 USD
    expenses:unknown                        255.00 USD

In this transaction I was able to assign the appropriate account1, but there is no context given for what account2 is and thus I cannot use a rule to automatically categorize it.

What is the intended workflow for a situation like this? Normally I would just manually add the account2 and a comment, but obviously I will not be doing that in the auto-generated .journal file. Do you remove the entry from the in/[filename].csv and just handle this transaction manually? I fear that would cause problems if I reimport my csv at a later date.

Thanks for the help.

Answer 1 · 2019-07-31T19:01:31.000Z

After perusing the wiki some more, I am thinking I can create an intermediate account as per this section

I would create a rule such that my main file generates the transaction like this:

2019/07/15 Check Paid #1020 : Checking
    assets:bank:checking                  -255.00 USD
    expenses:checking                        255.00 USD

In a separate journal file, I would maintain a list of transactions like this:

2019/07/15 Check Paid #1020 : Checking
    expenses:checking                  -255.00 USD ; cancel out the amount from the other posting
    expenses:actual category          255.00 USD ; properly categorize the transaction

This separate journal would be included in my main journal and would not be enrolled in the export.hs pipeline.

I will try implementing this later today and if it works I will close this issue. If you have any improvements or advice on my idea feel free to comment as well.

Thanks!

Answer 2 · 2019-07-31T22:04:15.000Z

Matching in CSV conversion rules is done on the whole line, so you could create a rule that matches every little bit you have.

For example, even if you have just the date and amount (and lets assume that account1 is implied by the filename), you would still be able to create a rule that says "these 255 usd on 2019-07-15 were this transaction".

So, like this section says, you would just add a line similar to this one to your rule file, and it will be almost as easy as writing comment directly in the file:

2017/07/15.*255.00|account2|comment

Answer 3 · 2019-08-01T04:20:14.000Z

For those check transactions with no information, i have to manually go to my checking account and look at the check image for the details. I am going to go with the manually maintained file for those. Otherwise I can make use of the csv rules like you said so I am going to close this issue.

Thanks again

Answer 4 · 2021-02-28T07:23:05.000Z

Hello! Thank you for sharing this useful hledger set-up with the world :)

I am faced with a similar issue where I have many transactions from a financial institution which require manual categorization.
The solution presented by @adept (create hyper-specific rules inside the your rules file which only ever apply to a single transaction) works well. Unfortunately, it causes the rules file for an institution to grow larger with each passing year as it fills up with hyper-specific rules which are not relevant to subsequent years.

Is it possible to add additional rules files which are applied only to a single csv (instead of all csvs inside an institution directory)? Since you can't conditionally include rules files inside other rules files, I'm not sure what would be the best way to do this. Creating separate sub-directories inside import/lloyds/ for each csv seems overly complex (though it seems like an appropriate use of the csv include rule).

The solution presented by @MichaelVessia (maintain a separate journal file with the ambiguous transactions, which is then included in the year-level journal) does not suffer from this problem, but it requires the extra burden of maintaining a separate directory of manually maintained journals. I think I will use this solution for now, but I would be grateful to hear thoughts on this problem.

Thank you again! Your tool will certainly save me hours in the coming years.

Answer 5 · 2021-03-01T21:50:09.000Z

Well, you certainly could have separate rules files for separate input files.

The easiest way would be to have the basename of the rule to be the same as the name of the input file, and then in the csv2journal script for your source do something like this:

hledger print --rules-file ./rules/$(basename "$1") -f "$1"

(which assumes that your rules riles are in the ./rules subdirectory, next to the ./csv,, ./journal etc).

You would also need to declare the dependency on the rules file :

extraDeps file
  | "//your-source//*.journal" ?== file =
      let Just [_, basename] = filePattern "**/*.journal" file
      in
        ["./rules/" ++ basename ++ ".rules"]

And this, i think, should be enough.

Answer 6 · 2021-03-02T11:35:47.000Z

@adept Brilliant! This solution is much more elegant than what I'd planned on doing. Two things:

hledger print --rules-file ./rules/$(basename "$1") -f "$1" passes files like your-source/rules/*.csv to hledger.
hledger print --rules-file ./rules/$(basename "$1" .csv)".rules" -f "$1" passes files like your-source/rules/*.rules.
I set csv2journal to the latter, since that matches the extraDeps function (and it makes sense).

Dependency checking doesn't seem to be working. I have saved the rules files which are not specific to a single csv as your-source/rules/*.rules, and when those files change, they do not cause new reports to be generated when export.hs runs.

I tried replacing let Just [_, basename] = filePattern "**/*.journal" file with let basename = takeBaseName file, but that didn't make a difference.

Thank you again! Besides the dependency checking, it works perfectly.

Answer 7 · 2021-03-02T12:33:46.000Z

Yes, i totally forgot about suffix stripping, thanks.

Re dependency checking: when you change build rules, files listed as dependencies will be inspected and their particulars (modification time, for example) saved only the next time target is built. So when you change dependency computation code, you probably want to run ./export.sh --rebuilt to force the rebuild of all targets. See if dependencies would be properly tracked after that?

Answer 8 · 2021-03-03T00:06:47.000Z

You're welcome!

The problem was that I added the dependencies wrong. My extraDeps function looks like this:

extraDeps file
  | "//your-source//*.journal" ?== file   = ["your-source.rules", "generated.rules"]
  | "//your-source//*.journal" ?== file =
      let Just [_, basename] = filePattern "**/*.journal" file
      in
        ["./rules/" ++ basename ++ ".rules"]
  | otherwise = []

I kept "//your-source//*.journal" ?== file = ["your-source.rules", "generated.rules"] because inside each your-source/*.rules, there is a line which includes the shared rules in your-source.rules. The guard stops at the first match, so of course it doesn't match "./rules/*.rules".

What is the correct way to concatenate those two arrays? (I hope the intended behavior is clear from the incorrect code above)

Thank you for your help!

Answer 9 · 2021-03-03T00:59:35.000Z

You could just place your code into a single clause, unless i am missing something:

extraDeps file
  | "//your-source//*.journal" ?== file =
      let Just [_, basename] = filePattern "**/*.journal" file
      in
        ["./rules/" ++ basename ++ ".rules", "your-source.rules", "generated.rules"]
  | otherwise = []

For debugging, you would find import Debug.Trace and functions trace and traceShow useful

Answer 10 · 2021-03-03T01:25:12.000Z

That works perfectly! I wasn't sure about the syntax.
Thanks for the debugging tips! Your export.hs script is the first Haskell code I've played with. I like it very much - thanks for helping me out.

If it sounds like a good idea, I would be happy to draft up a wiki page which describes how to include rules files which only affect a single csv (and an explanation of why one might want to do that). I should be able to do that in the next month, might not be right away.

Answer 11 · 2021-03-03T01:26:15.000Z

Happy to help! That wiki page sounds like a lovely idea, pleas do!

Answer 12 · 2021-03-03T01:28:15.000Z

Great! Would it be appropriate to put it at the end (it would become number 12)? I think it might also flow well to insert it after "Maintaining CSV rules" (it would become number 7).

Answer 13 · 2021-03-03T01:29:47.000Z

Actually, it might be better to put it at the end, so that it's not necessary to include that code in the subsequent steps.

Answer 14 · 2021-03-03T01:32:27.000Z

I think putting it at the end works quite well. It is not too difficult to renumber everything, but numbering is quite arbitrary anyway, so i would not bother too much trying to put this new section close to "Maintaining CSV rules"

Answer 15 · 2021-03-03T01:33:23.000Z

Sounds good! I'll get back to you soon with PR :)