fish-face/quasselgrep

python process gets killed when trying to export a large buffer

Opened this issue · 7 comments

VM has around 500mb of ram, 3gb of free space.
went in the psql prompt to check how many lines the buffer has

psql query:
select buffer.buffername,count(buffer.buffername) as counts from buffer inner join backlog on buffer.bufferid=backlog.bufferid group by buffer.buffername order by counts

number of lines:
#ubuntu | 5505647

i execute ./quasselgrep -N 'Freenode' -b '#ubuntu' > ubuntu.txt
which gets killed and dmesg -T | grep process results to:

[Fri Jan 12 15:04:48 2018]  [<ffffffff81151eae>] oom_kill_process+0x1ce/0x330
[Fri Jan 12 15:04:48 2018] Out of memory: Kill process 32263 (python) score 536 or sacrifice child
[Fri Jan 12 15:04:48 2018] Killed process 32263 (python) total-vm:948560kB, anon-rss:380396kB, file-rss:824kB

Here is the problem:

results = self.cursor.fetchall()

For non-context queries it's probably an easy fix.

This should be fixed now. Could you check that your use-case works?

Still happens,

quasselgrep -N 'network'' -b 'targetbuffer' > test.txt returns 700k lines.
if i add the -i switch
quasselgrep -N 'network'' -b 'targetbuffer' -i > test.txt it gets killed

[Mon Jun 18 12:31:30 2018]  [<ffffffff81151eae>] oom_kill_process+0x1ce/0x330
[Mon Jun 18 12:31:30 2018] Out of memory: Kill process 3813 (quasselgrep) score 483 or sacrifice child
[Mon Jun 18 12:31:30 2018] Killed process 3813 (quasselgrep) total-vm:1135348kB, anon-rss:670992kB, file-rss:0kB

the query below returns more than 5 million lines, which i am guessing it is the amount which should be returned if i add the -i switch.

COPY (
SELECT back.time,sender.sender,back.message 
FROM backlog AS back JOIN buffer AS buff ON buff.bufferid=back.bufferid
JOIN sender ON sender.senderid=back.senderid
WHERE buffername='#targetbuffer' 
ORDER BY back.time ASC
) TO '/tmp/something.txt';

I have now finally managed to reproduce this! It somehow managed to kill my quasselcore at the same time as the python process, so that's annoying. I also think I know the cause now - turns out iterating over a postgres database cursor actually just slurps all the rows initially by default. Shouldn't be too hard to fix!

OK @pitastrudl it should be fixed in master, can you test hopefully for the last time?

@pitastrudl did you ever see this again?

Hi @fish-face, sadly not, but it is in my to-do to test again. Since i've posted this i've come to learn more about postgres and python so maybe i'll be able to help more! Quassel also just released a new release, so it will be more interesting to test. I'll try it out and let you know.

Happy new years!