Memory fragmentation prevents memory release on Linux
Rongronggg9 opened this issue · 4 comments
Code to reproduce
import gc
import os
import colorlog
import psutil
from concurrent import futures
from feedparser import parse
# from memory_profiler import profile
colorlog.basicConfig(format='%(log_color)s%(asctime)s:%(levelname)s - %(message)s',
datefmt='%Y-%m-%d-%H:%M:%S',
level=colorlog.DEBUG)
logger = colorlog.getLogger()
def get_memory_usage():
return f'Memory usage: {(psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024):.2f} MiB'
# @profile
def monitor(rss_content):
rss_d = parse(rss_content, sanitize_html=False)
if rss_d is None:
return
logger.debug('Parsed! ' + get_memory_usage())
del rss_d
gc.collect()
logger.debug('Garbage collected! ' + get_memory_usage())
return
# @profile # if memory_profiler enabled, would not leak but runs slowly
def would_leak_1(feed_list):
logger.info('would_leak_1 started! ' + get_memory_usage())
for feed_content in feed_list:
monitor(feed_content)
logger.info('would_leak_1 finished! ' + get_memory_usage())
gc.collect()
logger.info('would_leak_1 garbage collected! ' + get_memory_usage())
# @profile
def would_leak_2(feed_list):
logger.info('would_leak_2 started! ' + get_memory_usage())
with futures.ThreadPoolExecutor(max_workers=1) as pool:
for feed_content in feed_list:
pool.submit(monitor, feed_content).result()
logger.info('would_leak_2 finished! ' + get_memory_usage())
gc.collect()
logger.info('would_leak_2 garbage collected! ' + get_memory_usage())
# @profile
def main():
logger.info('Started! ' + get_memory_usage())
feed_list = []
feeds = os.listdir('feeds') # tons of feed.xml
for feed in feeds:
with open('feeds/' + feed, 'rb') as f:
feed_list.append(f.read())
logger.info('Feeds loaded into memory! ' + get_memory_usage())
would_leak_1(feed_list)
would_leak_2(feed_list)
gc.collect()
logger.info('Done! ' + get_memory_usage())
del feed_list
del feeds
gc.collect()
logger.info('Feeds in memory cleared! ' + get_memory_usage())
return
if __name__ == '__main__':
main()
My tests
feedparser 6.0.8
Debian GNU/Linux 11 (bullseye) on WSL (CPython 3.9.2) - Leaked!
neofetch
_,met$$$$$gg. ***@***
,g$$$$$$$$$$$$$$$P. ----------------------
,g$$P" """Y$$.". OS: Debian GNU/Linux 11 (bullseye) on Windows 10 x86_64
,$$P' `$$$. Kernel: 5.10.43.3-microsoft-standard-WSL2
',$$P ,ggs. `$$b: Uptime: 3 hours, 13 mins
`d$$' ,$P"' . $$$ Packages: 1939 (dpkg)
$$P d$' , $$P Shell: zsh 5.8
$$: $$. - ,d$$' Theme: Breeze [GTK2/3]
$$; Y$b._ _,d$P' Icons: breeze [GTK2/3]
Y$$. `.`"Y$$$$P"' Terminal: Windows Terminal
`$$b "-.__ CPU: Intel i7-10510U (8) @ 2.304GHz
`Y$$ GPU: f549:00:00.0 Microsoft Corporation Device 008e
`Y$$. Memory: 487MiB / 1917MiB
`$$b.
`Y$$b.
`"Y$b._
`"""
2021-10-04-07:37:34:INFO - Started! Memory usage: 42.16 MiB
2021-10-04-07:37:34:INFO - Feeds loaded into memory! Memory usage: 68.00 MiB
2021-10-04-07:37:34:INFO - would_leak_1 started! Memory usage: 68.00 MiB
2021-10-04-07:37:53:INFO - would_leak_1 finished! Memory usage: 105.77 MiB
2021-10-04-07:37:53:INFO - would_leak_1 garbage collected! Memory usage: 105.77 MiB
2021-10-04-07:37:53:INFO - would_leak_2 started! Memory usage: 105.77 MiB
2021-10-04-07:38:12:INFO - would_leak_2 finished! Memory usage: 165.69 MiB
2021-10-04-07:38:12:INFO - would_leak_2 garbage collected! Memory usage: 108.25 MiB
2021-10-04-07:38:12:INFO - Done! Memory usage: 108.25 MiB
2021-10-04-07:38:12:INFO - Feeds in memory cleared! Memory usage: 93.86 MiB
Debian GNU/Linux 11 (bullseye) on Azure b1s (CPython 3.9.2) - Leaked!
neofetch
_,met$$$$$gg. ***@***
,g$$$$$$$$$$$$$$$P. -------
,g$$P" """Y$$.". OS: Debian GNU/Linux 11 (bullseye) x86_64
,$$P' `$$$. Host: Virtual Machine Hyper-V UEFI Release v4.1
',$$P ,ggs. `$$b: Kernel: 5.10.0-8-cloud-amd64
`d$$' ,$P"' . $$$ Uptime: 4 days, 6 hours, 9 mins
$$P d$' , $$P Packages: 681 (dpkg)
$$: $$. - ,d$$' Shell: bash 5.1.4
$$; Y$b._ _,d$P' Terminal: /dev/pts/2
Y$$. `.`"Y$$$$P"' CPU: Intel Xeon E5-2673 v4 (1) @ 2.294GHz
`$$b "-.__ Memory: 563MiB / 913MiB
`Y$$
`Y$$.
`$$b.
`Y$$b.
`"Y$b._
`"""
2021-10-03-23:35:10:INFO - Started! Memory usage: 20.17 MiB
2021-10-03-23:35:10:INFO - Feeds loaded into memory! Memory usage: 50.20 MiB
2021-10-03-23:35:10:INFO - would_leak_1 started! Memory usage: 50.46 MiB
2021-10-03-23:35:28:INFO - would_leak_1 finished! Memory usage: 94.25 MiB
2021-10-03-23:35:28:INFO - would_leak_1 garbage collected! Memory usage: 94.25 MiB
2021-10-03-23:35:28:INFO - would_leak_2 started! Memory usage: 94.25 MiB
2021-10-03-23:35:45:INFO - would_leak_2 finished! Memory usage: 152.66 MiB
2021-10-03-23:35:45:INFO - would_leak_2 garbage collected! Memory usage: 152.66 MiB
2021-10-03-23:35:45:INFO - Done! Memory usage: 152.66 MiB
2021-10-03-23:35:45:INFO - Feeds in memory cleared! Memory usage: 73.13 MiB
AOSC OS aarch64 (CPython 3.8.6) - Leaked!
neofetch
.:+syhhhhys+:. root@tmp-8d740a05
.ohNMMMMMMMMMMMMMMNho. -----------------
`+mMMMMMMMMMMmdmNMMMMMMMMm+` OS: AOSC OS aarch64
+NMMMMMMMMMMMM/ `./smMMMMMN+ Host: Pine64 RockPro64 v2.0
.mMMMMMMMMMMMMMMo -yMMMMMm. Kernel: 5.12.13-aosc-rk64
:NMMMMMMMMMMMMMMMs .hMMMMN: Uptime: 61 days, 17 hours, 31 mins
.NMMMMhmMMMMMMMMMMm+/- oMMMMN. Packages: 441 (dpkg)
dMMMMs ./ymMMMMMMMMMMNy. sMMMMd Shell: bash 5.1.8
-MMMMN` oMMMMMMMMMMMN: `NMMMM- CPU: (6) @ 1.416GHz
/MMMMh NMMMMMMMMMMMMm hMMMM/ Memory: 216MiB / 3868MiB
/MMMMh NMMMMMMMMMMMMm hMMMM/
-MMMMN` :MMMMMMMMMMMMy. `NMMMM-
dMMMMs .yNMMMMMMMMMMMNy/. sMMMMd
.NMMMMo -/+sMMMMMMMMMMMmMMMMN.
:NMMMMh. .MMMMMMMMMMMMMMMN:
.mMMMMMy- NMMMMMMMMMMMMMm.
+NMMMMMms/.` mMMMMMMMMMMMN+
`+mMMMMMMMMNmddMMMMMMMMMMm+`
.ohNMMMMMMMMMMMMMMNho.
.:+syhhhhys+:.
2021-10-04-09:00:49:INFO - Started! Memory usage: 17.85 MiB
2021-10-04-09:00:49:INFO - Feeds loaded into memory! Memory usage: 43.87 MiB
2021-10-04-09:00:49:INFO - would_leak_1 started! Memory usage: 44.13 MiB
2021-10-04-09:01:39:INFO - would_leak_1 finished! Memory usage: 90.15 MiB
2021-10-04-09:01:39:INFO - would_leak_1 garbage collected! Memory usage: 90.15 MiB
2021-10-04-09:01:39:INFO - would_leak_2 started! Memory usage: 90.15 MiB
2021-10-04-09:02:30:INFO - would_leak_2 finished! Memory usage: 131.78 MiB
2021-10-04-09:02:30:INFO - would_leak_2 garbage collected! Memory usage: 131.78 MiB
2021-10-04-09:02:30:INFO - Done! Memory usage: 131.78 MiB
2021-10-04-09:02:31:INFO - Feeds in memory cleared! Memory usage: 131.78 MiB
Armbian bullseye (21.08.2) aarch64 (CPython 3.9.2) - Leaked!
neofetch
***@***
----------------
█ █ █ █ █ █ █ █ █ █ █ OS: Armbian bullseye (21.08.2) aarch64
███████████████████████ Host: Pine H64 model B
▄▄██ ██▄▄ Kernel: 5.10.60-sunxi64
▄▄██ ███████████ ██▄▄ Uptime: 1 hour, 56 mins
▄▄██ ██ ██ ██▄▄ Packages: 1098 (dpkg)
▄▄██ ██ ██ ██▄▄ Shell: zsh 5.8
▄▄██ ██ ██ ██▄▄ Terminal: /dev/pts/0
▄▄██ █████████████ ██▄▄ CPU: sun50iw1p1 (4) @ 1.800GHz
▄▄██ ██ ██ ██▄▄ Memory: 817MiB / 1989MiB
▄▄██ ██ ██ ██▄▄
▄▄██ ██ ██ ██▄▄
▄▄██ ██▄▄
███████████████████████
█ █ █ █ █ █ █ █ █ █ █
2021-10-08-17:22:46:INFO - Started! Memory usage: 19.61 MiB
2021-10-08-17:22:47:INFO - Feeds loaded into memory! Memory usage: 46.16 MiB
2021-10-08-17:22:47:INFO - would_leak_1 started! Memory usage: 46.16 MiB
2021-10-08-17:24:03:INFO - would_leak_1 finished! Memory usage: 87.75 MiB
2021-10-08-17:24:03:INFO - would_leak_1 garbage collected! Memory usage: 87.75 MiB
2021-10-08-17:24:03:INFO - would_leak_2 started! Memory usage: 87.75 MiB
2021-10-08-17:25:20:INFO - would_leak_2 finished! Memory usage: 125.73 MiB
2021-10-08-17:25:20:INFO - would_leak_2 garbage collected! Memory usage: 126.00 MiB
2021-10-08-17:25:20:INFO - Done! Memory usage: 126.00 MiB
2021-10-08-17:25:20:INFO - Feeds in memory cleared! Memory usage: 106.37 MiB
Windows 11 22000.194 (CPython 3.9.2) - Just leaked little, which can be ignored.
neofetch
,.=:!!t3Z3z., ***@***
:tt:::tt333EE3 ----------------------
Et:::ztt33EEEL @Ee., .., OS: Windows 11 x86_64
;tt:::tt333EE7 ;EEEEEEttttt33# Host: ***
:Et:::zt333EEQ. $EEEEEttttt33QL Kernel: 10.0.22000
it::::tt333EEF @EEEEEEttttt33F Uptime: 9 hours, 26 mins
;3=*^```"*4EEV :EEEEEEttttt33@. Packages: 3 (scoop)
,.=::::!t=., ` @EEEEEEtttz33QF Shell: bash 4.4.23
;::::::::zt33) "4EEEtttji3P* Resolution: 1920x1080
:t::::::::tt33.:Z3z.. `` ,..g. DE: Aero
i::::::::zt33F AEEEtttt::::ztF WM: Explorer
;:::::::::t33V ;EEEttttt::::t3 WM Theme: Custom
E::::::::zt33L @EEEtttt::::z3F Terminal: Windows Terminal
{3=*^```"*4E3) ;EEEtttt:::::tZ` CPU: Intel i7-10510U (8) @ 2.310GHz
` :EEEEtttt::::z7 Memory: 14760MiB / 24329MiB
"VEzjt:;;z>*`
2021-10-04-07:50:52:INFO - Started! Memory usage: 23.91 MiB
2021-10-04-07:50:52:INFO - Feeds loaded into memory! Memory usage: 49.70 MiB
2021-10-04-07:50:52:INFO - would_leak_1 started! Memory usage: 49.74 MiB
2021-10-04-07:51:08:INFO - would_leak_1 finished! Memory usage: 57.93 MiB
2021-10-04-07:51:08:INFO - would_leak_1 garbage collected! Memory usage: 57.93 MiB
2021-10-04-07:51:08:INFO - would_leak_2 started! Memory usage: 57.93 MiB
2021-10-04-07:51:26:INFO - would_leak_2 finished! Memory usage: 57.11 MiB
2021-10-04-07:51:26:INFO - would_leak_2 garbage collected! Memory usage: 57.11 MiB
2021-10-04-07:51:26:INFO - Done! Memory usage: 57.11 MiB
2021-10-04-07:51:26:INFO - Feeds in memory cleared! Memory usage: 30.46 MiB
Windows 11 22000.194 (PyPy 7.3.5, Python 3.7.10) - Leaked!
2021-10-04-07:55:34:INFO - Started! Memory usage: 45.91 MiB
2021-10-04-07:55:34:INFO - Feeds loaded into memory! Memory usage: 81.40 MiB
2021-10-04-07:55:34:INFO - would_leak_1 started! Memory usage: 81.40 MiB
2021-10-04-07:55:56:INFO - would_leak_1 finished! Memory usage: 113.78 MiB
2021-10-04-07:55:56:INFO - would_leak_1 garbage collected! Memory usage: 113.78 MiB
2021-10-04-07:55:56:INFO - would_leak_2 started! Memory usage: 113.78 MiB
2021-10-04-07:56:22:INFO - would_leak_2 finished! Memory usage: 122.85 MiB
2021-10-04-07:56:22:INFO - would_leak_2 garbage collected! Memory usage: 122.85 MiB
2021-10-04-07:56:22:INFO - Done! Memory usage: 122.86 MiB
2021-10-04-07:56:22:INFO - Feeds in memory cleared! Memory usage: 84.39 MiB
Note
If I run would_leak_1 and would_leak_2 separately, their leaking behavior seems the same. However, running them sequentially at a time does make the second-run one leak less under some conditions as you see.
I got more data in production.
I have two instances of https://github.com/Rongronggg9/RSS-to-Telegram-Bot on the same VPS. One with ~4000 feeds, another one with ~3000 feeds. The bot will check the updates of feeds frequently. I noticed that the relation between the number of feeds and the amount of memory leakage is a logarithm relation. And parsing the same feed (no matter if it keeps the same or is updated) multiple times leaks less than parsing different feeds once, but when the same feed has been parsed fairly high times, the memory leakage will hardly increase. That is to say, the relation between the number of times of parsing and the amount of memory leakage is also a logarithm relation.
I guess the leaked objects can somehow be reused? If that's true, it will be a helpful clue to figuring out the cause of memory leakage.
Related: #302 (comment)
Hi, coming here from your comment on #302.
I ran a few tests where I called feedparser.parse() in a loop and measured memory usage (details below). I tried two feeds, one 2M and one 50K, both loaded from disk; I did this both on macOS and on Ubuntu.
The results are as you describe, the max RSS increases in what looks like a logarithmic curve; that is, after enough iterations (10-100), the max RSS remains almost horizontal/stable.
However, I am not convinced this is a memory leak in feedparser.
Rather, I think it's a side-effect of how Python memory allocation works. Specifically, Python never releases allocated memory back to the operating system (1, 2, 3), but keeps it around and reuses it. (Because of this, running gc.collect() will never decrease RSS.)
I assume the initial sharper memory increase is due to fragmentation (even if there's enough memory available, it's not in a contiguous chunk, so the allocator has to allocate additional memory); as more and more memory is allocated and then released (in the pool), it becomes easier to find a contiguous chunk.
It makes sense for #302 to make max RSS stabilize faster, since it reduces the number of allocations – and more importantly, the number of big (whole feed) allocations (which reduces the impact of fragmentation).
It might be possible to confirm this 100% by measuring the used memory as seen by the Python allocator, instead of max RSS.
Script:
import sys, resource
import feedparser
print(" loop maxrss")
for i in range(10 ** 3 + 1):
with open(sys.argv[1], 'rb') as file:
feedparser.parse(file)
maxrss = (
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
/ 2 ** (20 if sys.platform == 'darwin' else 10)
)
if (i <= 10) or (i <= 100 and i % 10 == 0) or (i <= 1000 and i % 100 == 0):
print(f"{i:>8} {maxrss:>8.3f}")
Output:
macOS Catalina, Python 3.9.10, feedparser 6.0.8
2.2M feed
loop maxrss
0 47.895
1 50.555
2 50.582
3 50.613
4 50.613
5 50.613
6 50.613
7 50.625
8 50.648
9 50.656
10 50.656
20 50.727
30 50.727
40 50.727
50 50.742
60 50.758
70 50.820
80 50.820
90 50.820
100 50.820
52K feed
loop maxrss
0 17.297
1 17.484
2 17.566
3 17.645
4 17.777
5 17.836
6 17.891
7 17.949
8 18.008
9 18.094
10 18.152
20 18.172
30 18.188
40 18.242
50 18.277
60 18.285
70 18.324
80 18.336
90 18.344
100 18.352
200 18.359
300 18.387
400 18.410
500 18.438
600 18.461
700 18.461
800 18.461
900 18.465
macOS Catalina, Python 3.9.10, feedparser 6.0.8 + #302
2.2M feed
loop maxrss
0 24.578
1 24.578
2 24.578
3 24.578
4 24.578
5 24.578
6 24.578
7 24.578
8 24.578
9 24.578
10 24.578
20 24.578
52K feed
loop maxrss
0 17.598
1 17.723
2 17.805
3 17.918
4 18.031
5 18.117
6 18.172
7 18.230
8 18.285
9 18.340
10 18.352
20 18.383
30 18.414
40 18.426
50 18.441
60 18.453
70 18.461
80 18.492
90 18.504
100 18.508
200 18.543
300 18.543
400 18.590
500 18.590
600 18.590
700 18.598
800 18.598
900 18.598
Ubuntu 20.04, Python 3.8.10, feedparser 6.0.8
2.2M feed
loop maxrss
0 42.988
1 46.996
2 46.996
3 47.367
4 47.367
5 47.367
6 47.367
7 47.367
8 47.367
9 47.367
10 47.367
20 47.883
30 47.883
40 47.883
50 47.883
52K feed
loop maxrss
0 15.832
1 16.090
2 16.137
3 16.188
4 16.191
5 16.191
6 16.191
7 16.191
8 16.195
9 16.195
10 16.195
20 16.227
30 16.238
40 16.246
50 16.258
60 16.320
70 16.332
80 16.395
90 16.406
100 16.406
200 16.457
300 16.457
400 16.457
500 16.457
600 16.586
700 16.586
800 16.586
900 16.586
1000 16.586
Ubuntu 20.04, Python 3.8.10, feedparser 6.0.8 + #302
2.2M feed
loop maxrss
0 20.566
1 20.934
2 20.934
3 20.934
4 20.934
5 20.934
6 20.934
7 20.934
8 20.934
9 21.137
10 21.137
20 21.266
30 21.266
40 21.430
50 21.430
60 21.516
70 21.516
80 21.516
90 21.516
100 21.516
52K feed
loop maxrss
0 16.355
1 16.688
2 16.715
3 16.871
4 16.898
5 16.922
6 16.922
7 16.922
8 16.926
9 16.926
10 16.926
20 16.965
30 16.977
40 16.988
50 16.996
60 17.031
70 17.043
80 17.055
90 17.062
100 17.066
200 17.066
300 17.070
400 17.070
500 17.070
600 17.070
700 17.070
800 17.070
900 17.078
1000 17.078
Hi, @lemon24. Thanks for your share.
I can confirm that your statement "I am not convinced this is a memory leak in feedparser" is true. BeautifulSoup(something, 'html.parser')
(html.parser
is written in pure Python) "leaks" in the same pattern as feedparser.parse(something)
, while BeautifulSoup(something, 'lxml')
(lxml
is written in C) "leaks" nothing. (Would feedparser
adopting lxml
as a parser backend help reduce the memory usage? Probably, lol.)
However, after confirming the previous statement, I did a deep dive. I believe that your statement "Python never releases allocated memory back to the operating system, but keeps it around and reuses it" is incorrect.
Python does release unused memory, but the prerequisite is that it can. It is fragmentation that breaks this prerequisite and is a glibc malloc
issue instead of a Python-specific issue.
By default, <128KB malloc
uses sbrk
instead of mmap
to allocate memory. Fragment on high address, which was originally allocated by sbrk
, prevents memory compaction from releasing low-address-free memory. However, memory allocated by mmap
is managed by the OS and comes without such a disadvantage. What's worse, the threshold is dynamic nowadays and can be increased at runtime (up to 4*1024*1024*sizeof(long)
on 64-bit systems!). The default malloc
policy is actually a space-time tradeoff since the mmap
syscall is costly. That's the real reason for the "leakage" and explains why CPython on Windows is not affected. Also explains why the feeds loaded into memory as strings can be released - most of them are larger than 128KB!
In conclusion, your PR (#302) does help reduce the "leakage", but fairly limited. My final solution is shown below.
Prohibiting the usage of sbrk
by setting M_MMAP_THRESHOLD
to 0
eliminates the "leakage". It is just an experiment, do not set M_MMAP_THRESHOLD
to a fairly low value in production or you will face performance issues.
As a solution in production, 16384
(16KB) is a nice value for those concerned about the issue. Even the default initial value 131072
(128KB) helps a lot since setting the value of M_MMAP_THRESHOLD
effectively disables its dynamic increment.
1. ctypes
+import ctypes
+libc = ctypes.cell.LoadLibrary("libc.so.6")
+M_MMAP_THRESHOLD = -3
+libc.mallopt(M_MMAP_THRESHOLD, 0) # effectively prohibit `sbrk`
import gc
import os
...
2022-05-27-01:35:17:INFO - Started! Memory usage: 54.39 MiB
2022-05-27-01:35:17:INFO - Feeds loaded into memory! Memory usage: 80.66 MiB
2022-05-27-01:35:17:INFO - would_leak_1 started! Memory usage: 80.66 MiB
2022-05-27-01:35:44:INFO - would_leak_1 finished! Memory usage: 84.94 MiB
2022-05-27-01:35:44:INFO - would_leak_1 garbage collected! Memory usage: 84.94 MiB
2022-05-27-01:35:44:INFO - would_leak_2 started! Memory usage: 84.94 MiB
2022-05-27-01:36:13:INFO - would_leak_2 finished! Memory usage: 85.52 MiB
2022-05-27-01:36:13:INFO - would_leak_2 garbage collected! Memory usage: 85.52 MiB
2022-05-27-01:36:13:INFO - Done! Memory usage: 85.52 MiB
2022-05-27-01:36:13:INFO - Feeds in memory cleared! Memory usage: 59.30 MiB
2. Environment variables
Note: In this way, even the initialization of Python is affected, so setting the value to
0
consumes more memory to initialize Python. Do not setMALLOC_MMAP_THRESHOLD_
less than8192
in production, this ensures that the memory consumption will not be larger than a vanilla execution and the performance is mostly not affected.
$ MALLOC_MMAP_THRESHOLD_=0 python script.py
2022-05-27-01:52:03:INFO - Started! Memory usage: 72.52 MiB
2022-05-27-01:52:03:INFO - Feeds loaded into memory! Memory usage: 98.79 MiB
2022-05-27-01:52:03:INFO - would_leak_1 started! Memory usage: 98.79 MiB
2022-05-27-01:52:39:INFO - would_leak_1 finished! Memory usage: 102.91 MiB
2022-05-27-01:52:39:INFO - would_leak_1 garbage collected! Memory usage: 102.91 MiB
2022-05-27-01:52:39:INFO - would_leak_2 started! Memory usage: 102.91 MiB
2022-05-27-01:53:08:INFO - would_leak_2 finished! Memory usage: 103.58 MiB
2022-05-27-01:53:08:INFO - would_leak_2 garbage collected! Memory usage: 103.56 MiB
2022-05-27-01:53:08:INFO - Done! Memory usage: 103.56 MiB
2022-05-27-01:53:08:INFO - Feeds in memory cleared! Memory usage: 77.35 MiB
Ref:
https://stackoverflow.com/questions/68225871/python3-give-unused-interpreter-memory-back-to-the-os
https://stackoverflow.com/questions/15350477/memory-leak-when-using-strings-128kb-in-python
https://stackoverflow.com/questions/35660899/reduce-memory-fragmentation-with-malloc-mmap-threshold-and-malloc-mmap-max
https://man7.org/linux/man-pages/man3/mallopt.3.html
A better workaround for multithread programs is to replace the ptmalloc
from glibc
with jemalloc
.
Rongronggg9/RSS-to-Telegram-Bot@ae69f73
Rongronggg9/RSS-to-Telegram-Bot@eb07fa9
jemalloc
shows impressive performance while maintaining a high memory recycling rate on multithread programs.
I've changed the title of the issue and would like to keep it open to be a guide for those developers facing the same issue. It would be better if the issue could be documented in the docs.
My conclusion is that to "solve" the issue at the feedparser
side, adopting lxml
might be the best and easiest solution. For downstream developers, the two workarounds I've described are easy to adopt.