Multithreading printing of duplicate results
chasingboy opened this issue ยท 9 comments
Hello, When I use multithreading, the runtime prints duplicate results.
import threading, queue
from alive_progress import alive_bar
import time
q = queue.Queue()
for url in range(1000):
q.put(url)
def requester(url):
print(url)
threads = []
with alive_bar(q.qsize()) as bar:
for _ in range(20):
while not q.empty():
bar() # Progress bar
url = q.get()
thread = threading.Thread(target=requester,args=(url,))
thread.setDaemon(True)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Running result
kali$ python3 test.py
on 1: 0
on 1: 0
on 2: 1
on 2: 1
on 3: 2
on 3: 2
on 4: 3
on 5: 4
on 6: 5
on 6: 5
on 7: 6
on 7: 6
What should I do to print the result once? Thanks!!!
Hi,
That is caused by the multiple refreshes of alive_progress
, you can't print from different threads simultaneously, which can cause havoc on the internal structures because they are not synchronized. And I can't synchronize them because that would add a lot of overhead in a super hot code.
Anyway, I think what you want is something like this:
import random, time
from concurrent import futures
def requester(url):
# print(f'requesting: {url}')
time.sleep(random.random())
with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(100) as bar:
future_to_url = {executor.submit(requester, url): url for url in range(100)}
for future in futures.as_completed(future_to_url):
url = future_to_url[future]
print(f'finished: {url}')
bar()
If you uncomment that line everything breaks again.
If you really need to print while on other threads, you should send the lines through a queue, test within the main alive_bar
loop if something is to be printed, and print them from there. Note the lines would probably get all randomly scrambled on the screen.
Does that solve it?
This would work for printing:
import random, time, queue
from concurrent import futures
def requester(url):
q.put(f'requesting: {url}')
time.sleep(random.random())
q = queue.Queue()
with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(100) as bar:
future_to_url = {executor.submit(requester, url): url for url in range(100)}
for future in futures.as_completed(future_to_url):
while not q.empty():
print(q.get())
url = future_to_url[future]
print(f'finished: {url}')
bar()
You'll get output like this:
on 0: requesting: 0
on 0: requesting: 1
on 0: requesting: 2
on 0: requesting: 3
on 0: finished: 0
on 1: requesting: 4
on 1: finished: 3
on 2: requesting: 5
on 2: finished: 4
on 3: requesting: 6
on 3: finished: 1
on 4: requesting: 7
on 4: finished: 2
on 5: requesting: 8
on 5: finished: 6
on 6: requesting: 9
on 6: finished: 7
on 7: requesting: 10
on 7: finished: 5
on 8: requesting: 11
...
Hmm, having strange behaviour when printing from threads sounds like a recipe for disaster. Lots of code, often beyond a developer's control, might try to print in a separate thread.
Actually, you're right @TheTechRobo.
I'll look into it a bit deeper.
Found it, and fixed it ๐
Now the print hooks are correctly synchronized when printing from multiple threads.
I've also improved them a bit along the way. I've never noticed that an empty print()
would output an empty "on X:".
Also, I've discovered that Python's str.splitlines()
is kinda flawed. 'line'.splitlines()
returns 1 line, but 'line\n'.splitlines()
does not return 2, which breaks my header algorithm for newlines...
So, using these new synced print hooks, I've tested this code:
import random, time
from concurrent import futures
def requester(url):
print(f'requesting: {url}\n', end='') # this is printed from the thread pool.
time.sleep(2*random.random())
with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(10) as bar:
print()
future_to_url = {executor.submit(requester, url): url for url in range(10)}
for future in futures.as_completed(future_to_url):
url = future_to_url[future]
print(f'finished: {url}') # this from the main thread.
bar()
It was printing this:
...:
on 0:
on 0: requesting: 0
requesting: 1
requesting: 2
requesting: 3
finished: 0
on 1: requesting: 4
finished: 2
on 2: requesting: 5
finished: 3
on 3: requesting: 6
finished: 1
on 4: requesting: 7
finished: 4
on 5: requesting: 8
finished: 7
on 6: requesting: 9
finished: 6
on 7: finished: 5
on 8: finished: 9
on 9: finished: 8
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 10/10 [100%] in 3.8s (2.53/s)
And now it will print this:
...:
on 0: requesting: 0
requesting: 1
requesting: 2
requesting: 3
finished: 0
on 1: requesting: 4
finished: 2
on 2: requesting: 5
finished: 1
on 3: requesting: 6
finished: 4
on 4: requesting: 7
finished: 3
on 5: requesting: 8
finished: 6
on 6: requesting: 9
finished: 5
on 7: finished: 8
on 8: finished: 9
on 9: finished: 7
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 10/10 [100%] in 3.0s (3.37/s)
Important to say that, thanks to the way that Python calls write
on print
, it is simply impossible to have 100% accurate lines printed.
Occasionally, if I run the code above several times, I get this:
...:
on 0: requesting: 0
requesting: 1
requesting: 2
requesting: 3
requesting: 4
requesting: 5
finished: 4
on 1: finished: 1
on 2: finished: 0
on 3: requesting: 6
finished: 2requesting: 7
requesting: 9
requesting: 8
on 4: finished: 5
on 5: finished: 3
on 6: finished: 6
on 7: finished: 9
on 8: finished: 8
on 9: finished: 7
|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 10/10 [100%] in 0.0s (8656.14/s)
Unfortunately, Python calls write
twice on a single print
call. It calls it with the text being printed, then with the end
param. The default print(text, end='\n')
then makes the calls stdout.write(text)
and stdout.write('\n')
.
And, since these two calls have no synchronization at all, my print hook might end up receiving text1
, text2
, \n
, \n
for two prints with text1
and text2
from different threads...
That's why finished: 2
and requesting: 7
ended up glued together, there's no way around it when multi-threading.
@rsalmei Thanks! Referring to your code, it is currently running normally.
You're welcome!