srs occasional crash caused by cleaning up unused SrsSource mechanism
haofz opened this issue · 7 comments
After adding the mechanism to clean up useless SrsSource,
https://github.com/ossrs/srs/commit/c7b97aa1c3de380451e2d229fc5bef702e2d264d
https://github.com/ossrs/srs/commit/590e9517398c4d442bcf125421353b4b338217d1
During testing, a crash issue was found in the Srs program. The log is as follows:
2016-12-13 14:16:42.790|trace|6104|243876|0|RTMP client ip=192.168.10.20
2016-12-13 14:16:42.792|trace|6104|243876|0|complex handshake success
2016-12-13 14:16:42.792|trace|6104|243876|0|connect app, tcUrl=rtmp://172.20.1.2:1935/live, pageUrl=, swfUrl=, schema=rtmp, vhost=defaultVhost, port=1935, app=live, param=, args=null
2016-12-13 14:16:42.792|trace|6104|243876|0|out chunk size to 60000
2016-12-13 14:16:42.872|trace|6104|243876|0|input chunk size to 60000
2016-12-13 14:16:42.873|trace|6104|243876|0|client identified, type=fmle-publish, stream_name=livestream, duration=-1.00
2016-12-13 14:16:42.873|trace|6104|243876|0|update req of soruce for auth ok
2016-12-13 14:16:42.873|trace|6104|243876|0|source url=defaultVhost/live/livestream, ip=192.168.10.20, cache=1, is_edge=0, source_id=-1[-1]
2016-12-13 14:16:42.888|trace|6104|243218|0|cleanup die source, total=43
2016-12-13 14:16:42.953|trace|6104|243876|0|>>>>>>>>>>>>> FMLE start to publish stream livestream, url=defaultVhost/live/livestream.
Analysis of the cause:
There was a blockage between the connect and actual streaming during a push stream, causing the source to be cleared and resulting in a program crash.
Process:
- The first push stream was successful and then disconnected.
- After 29 seconds, the second push stream occurred, and the SrsSource was fetched using the fetch() method. In the fetch() method, die_at was not set to -1 (this is not required in the create method).
- There was a blockage between the connect and publishing steps, such as expect_message(). If at this moment it happened to be exactly 30 seconds, the source would be cleared by the 1-second interval cleaning mechanism.
- When using the same SrsSource again, it crashed.
Solution:
In SrsSource::fetch(), add source->die_at = -1; This has been personally tested and found to be effective in resolving the issue.
TRANS_BY_GPT3
The fetch mechanism of Source is not atomic, this part needs to be changed.
TRANS_BY_GPT3
The problem with it is that after fetching, it may not necessarily create a consumer or publisher. After fetching, if it is not cleaned up, it will cause the source to not be cleaned up (if there are no subsequent publishers or consumers). If the die at status is not changed after fetching, it will result in the source being cleaned up and subsequent use of the source will crash. Therefore, setting die_at=-1 in the fetch is also incorrect.
In other words, the cleanup of the source needs to consider the scenario where it is not used (neither published nor consumed) after fetch_or_create.
TRANS_BY_GPT3
A simpler way is to remove the function of source cleaning.
Support for this feature will be added in the future.
Dup to #1509
Solution: #1579 (comment)
TRANS_BY_GPT3
Recently, I have also been researching this memory release issue. In the latest version of srs-2.0.243, I enabled the source cleaning mechanism and conducted the following tests:
-
I pushed 40 streams concurrently, and the memory reached 60MB. After enabling the source cleaning mechanism and ending the stream, I found that the source was indeed cleaned up. However, the memory still remained at around 60MB.
-
I pushed 100 streams, and the memory reached around 160MB. After ending the stream, I found that the source was also cleaned up. However, the memory will still remain at around 60MB.
So I have a question: Does SRS have a memory retention mechanism? To retain a certain amount of memory to prevent repetitive allocation? I have been searching for a while but couldn't find where it is. I would greatly appreciate it if someone could enlighten me.
Finally, thanks to Winlinvip for their open-source contribution!
TRANS_BY_GPT3
2.0 has already disabled the cleaning of Source, which will be implemented in 3.0. Is it the modification of the code that caused the crash when enabling source cleaning? Currently, this feature requires more modifications to be supported, so it was disabled in 2.0.
TRANS_BY_GPT3
Dup to #1509
Solution: #1579 (comment)