GermanT5/wikipedia2corpus

MultiProcessing Issue

CIXUniSaarland opened this issue · 6 comments

stack trace.txt

Hey, upon trying to run the process.py using the extracted wikipedia dump I am running into a No such File/Directory error. I am attaching the stack trace -- can you help me understand if my computational capabilities limit me from running tasks in parellel?

Maybe provide absolute path names instead of relative?

Not sure why this happens. :-(

Maybe you can add more info to the output. Or set pool_size to 1 to debug it?

Thanks for the input! :D
I figured out that it is something to do with pool. I decided to not run it parallel and made some modifications to the code such that it runs them one after the other -- it's slow and inelegant unfortunately. It think it's my system's drawback however :/

Hmm. Ok...

What is so special about your system? Is there anything I could improve on my side?

Ah, well. I debugged it further. I think adding a second or two between creating the file and appending into it, could be helpful. I used time.sleep(2) and I think it seems to working now.
What do you think?

What do you think?

I think this is very strange and id never happen on my computer.
Do you use a NFS file system or something like that?

Nope. NTFS and/or exFAT. I couldn't get the code running on Windows at all, barring the known wiki-extractor issue; but could run it on MacOS with a time.sleep(2); maybe something to do with the indexing process, not sure.
Thanks a lot for your prompt responses :)