trungdq88/logmine

line count does not seem to be correct

jhutar opened this issue · 3 comments

Hello. I'm running this:

# grep --no-filename duration /var/lib/pgsql/data/pg_log/postgresql-*.log | logmine --pattern-placeholder REPLACED --min-members 1 | sed 's/^\(.\{200\}\).*/\1/'
2479 REPLACED REPLACED EDT LOG: duration: REPLACED ms execute <unnamed>: UPDATE REPLACED SET REPLACED = $1, "updated_at" = $2 WHERE REPLACED = $3
 926 2020-05-22 02:46:52 EDT LOG: duration: 1341.912 ms statement: SELECT * FROM "dynflow_execution_plans" WHERE ("state" = 'scheduled') ORDER BY "started_at"
 179 2020-05-22 02:28:00 EDT LOG: duration: 977.242 ms statement: COMMIT
  13 REPLACED REPLACED EDT LOG: duration: REPLACED ms execute <unnamed>: select this_.id as id1_36_19_, this_.created as created2_36_19_, this_.updated as updated3_36_19_, this_.consumer_id as consume
  10 REPLACED REPLACED EDT LOG: duration: REPLACED ms statement: INSERT INTO "dynflow_actions" ("execution_plan_uuid", "id", "data", "input", "caller_execution_plan_id", "caller_action_id", "class", "
...

so I would expect there is 926 lines matching something like SELECT \* FROM "dynflow_execution_plans regexp (second line of output) - but there is only one:

# grep --no-filename duration /var/lib/pgsql/data/pg_log/postgresql-*.log | grep 'SELECT \* FROM "dynflow_execution_plans'
2020-05-22 02:46:52 EDT LOG:  duration: 1341.912 ms  statement: SELECT * FROM "dynflow_execution_plans" WHERE ("state" = 'scheduled') ORDER BY "started_at"

Did I understood the meaning of number in first column incorrectly, or is there some bug?

# python --version
Python 2.7.5
# pip freeze
logmine==0.1.4

Hi. Yes, you understand the first column correctly, it should represent the number of occurrences. So in this case it looks like a bug to me.

You can try run logmine with --single-core flag, which is a bit slower but should eliminate the parallel processing part, where most of the bugs usually live.

In addition, I would be very thankful if you can help provide the dataset or a subset of it which I can reproduce the issue and fix it if possible. My email is available in GitHub profile.

Thanks.

I think I found the issue, there is a problem with the pattern displaying in some edge cases. Please try again with version 0.1.5 and reopen the issue if the problem still persists.

Thank you, you are quick! Running with --single-core did not helped, but current git version (27ae6ca) worked as expected. If you are still interested in data set, please just let me know.