Memory leak
Closed this issue · 3 comments
The used memory of my program keeps going up when parsing HTML. This was fixed months ago: #90
Not sure why, it is happening again now, even with the same version that months ago was working fine.
If you run this code, you will see that memory only goes up and is never freed.
import psutil
import requests
from selectolax.lexbor import LexborHTMLParser
response = requests.get("https://github.com")
process = psutil.Process()
start = process.memory_info().rss
for i in range(20000):
a = LexborHTMLParser(response.text*10).css("a")
memory_usage = int((process.memory_info().rss - start) / 1024 ** 2)
print(f"Memory usage: {memory_usage:,}MB")
How much memory was consumed at max? Honestly, it does not look like a memory leak, more like the way Python preallocates memory. I got 500MB of consumed memory after 20k of iterations. You can remove the css()
call and still get some memory spikes.
@lexborisov To destroy the main parser we only need to call lxb_html_document_destroy
right?
For CSS I do:
lxb_selectors_destroy(self.selectors, True)
lxb_css_memory_destroy(self.parser.memory, True)
lxb_css_parser_destroy(self.parser, True)
lxb_css_selectors_destroy(self.css_selectors, True)
But not sure if lxb_css_memory_destroy
is really needed.
If you create the html parser separately, it should be destroyed separately.
lxb_html_parser_create()
lxb_html_parser_init()
document = lxb_html_parse();
lxb_html_parser_unref();
lxb_html_document_destroy()
or
lxb_html_document_create()
lxb_html_document_parse()
lxb_html_document_destroy()