line count does not seem to be correct
jhutar opened this issue · 3 comments
Hello. I'm running this:
# grep --no-filename duration /var/lib/pgsql/data/pg_log/postgresql-*.log | logmine --pattern-placeholder REPLACED --min-members 1 | sed 's/^\(.\{200\}\).*/\1/'
2479 REPLACED REPLACED EDT LOG: duration: REPLACED ms execute <unnamed>: UPDATE REPLACED SET REPLACED = $1, "updated_at" = $2 WHERE REPLACED = $3
926 2020-05-22 02:46:52 EDT LOG: duration: 1341.912 ms statement: SELECT * FROM "dynflow_execution_plans" WHERE ("state" = 'scheduled') ORDER BY "started_at"
179 2020-05-22 02:28:00 EDT LOG: duration: 977.242 ms statement: COMMIT
13 REPLACED REPLACED EDT LOG: duration: REPLACED ms execute <unnamed>: select this_.id as id1_36_19_, this_.created as created2_36_19_, this_.updated as updated3_36_19_, this_.consumer_id as consume
10 REPLACED REPLACED EDT LOG: duration: REPLACED ms statement: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "caller_execution_plan_id", "caller_action_id", "class", "
...
so I would expect there is 926 lines matching something like SELECT \* FROM "dynflow_execution_plans
regexp (second line of output) - but there is only one:
# grep --no-filename duration /var/lib/pgsql/data/pg_log/postgresql-*.log | grep 'SELECT \* FROM "dynflow_execution_plans'
2020-05-22 02:46:52 EDT LOG: duration: 1341.912 ms statement: SELECT * FROM "dynflow_execution_plans" WHERE ("state" = 'scheduled') ORDER BY "started_at"
Did I understood the meaning of number in first column incorrectly, or is there some bug?
# python --version
Python 2.7.5
# pip freeze
logmine==0.1.4
Hi. Yes, you understand the first column correctly, it should represent the number of occurrences. So in this case it looks like a bug to me.
You can try run logmine
with --single-core
flag, which is a bit slower but should eliminate the parallel processing part, where most of the bugs usually live.
In addition, I would be very thankful if you can help provide the dataset or a subset of it which I can reproduce the issue and fix it if possible. My email is available in GitHub profile.
Thanks.
I think I found the issue, there is a problem with the pattern displaying in some edge cases. Please try again with version 0.1.5 and reopen the issue if the problem still persists.