Frog (through python-frog) accumulates a huge number of temporary files
proycon opened this issue · 11 comments
User @StergiosMorakis ran a script processing tweets (via python-frog) that ran for a while and produced a lot of /tmp/frog* files with short input sequences. A million files were accumulated at a certain point. I should investigate when these are created (initial investigation seemed as if it was only used in server mode, which is not the case here, but I probably missed something). Then they should be cleared up in an earlier stage.
I assume this indirectly caused by the TiCC::tempname() function which creates an unique named file.
The calling program may use this file/filename for some time, but at some point it is responsible for removing it.
Frog doesn't do this, nor in Server mode (around line 874 of FrogAPI.cxx ) nor in the Frogtostring() function which is made specially for python-frog. , (around line 1149, needs some refactoring to delete the file before the return)
Addition: There are other programs that use TiCC::tempname(). They should all be investigated.
A more sustainable and C++ like solution is to build "tempstream" class around tempname() which will clean up automatically when the object gets out of scope.
Should't be that hard.
I fixed this in frog using a new tmp_stream() class at 2 locations.
Other locations where TiCC::tempname() was used already removed the temporary file, so all is sound now.
When the code is tested enough, the new tmp_stream class might be incorporated in ticcutils for general uses
Happy testing!
Ah! Thanks! :) I told @StergiosMorakis I'd look at it next week but you beat me to it already, he'll be happy to hear that. I'll give it a test soon.
@StergiosMorakis Is everything running nicely still after Ko's patch? Then I'll do a release.
Released v0.21 which fixes this issue.
@proycon , we are facing issues with centos 7 with other dependencies . For now i am planning to use older version . I want to cherry-pic changes related to this fix to older version
Found these related changes for this fix . Can you please check if there are any other commits related to this
a24ac9c
7a49cdc
4dc4b87
Yes, those are indeed the right commits pertaining to this issue.
@StergiosMorakis this is deleting files in /tmp folder but python is still maintaing the file descriptor count . Ideally this reference should be deleted after closing the file
So python-frog is failing to create another file in /tmp after it processes n request ( n = number set under ulimit -n )
what's the ulimit -n set to in your machine ?
Edit:-
Is this working fine for you ?
@padurucr7 Are you running into this problem despite the patches in those three commits?? Then we should reopen the issue I guess.
yeah , not sure why python is maintaining file descriptors ever after closing them in c++ part .
I went through the code and everything looks fine . May b this is issue with python .
So , i am explicitly closing file descriptors related to frog after each query . This fixed the issue
Note:- This issue is no way related to those 3 commits . It's happening with out these commits too .
base = '/proc/self/fd'
file_handler_idxs =[]
for num in os.listdir(base):
path = None
try:
path = os.readlink(os.path.join(base, num))
if path is not None and "frog" in path:
file_handler_idxs.append(num)
except Exception as e:
print(e)
pass
for idx,val in enumerate(file_handler_idxs):
os.close(int(val))