Multithreading printing of duplicate results

Question

Multithreading printing of duplicate results

chasingboy opened this issue 10 months ago · 9 comments

Hello, When I use multithreading, the runtime prints duplicate results.

import threading, queue
from alive_progress import alive_bar
import time

q = queue.Queue()

for url in range(1000):
	q.put(url)

def requester(url):
	print(url)

threads = []
with alive_bar(q.qsize()) as bar:
    for _ in range(20):
        while not q.empty():
            bar() # Progress bar
                
            url =  q.get()
            thread = threading.Thread(target=requester,args=(url,))
            thread.setDaemon(True)
            thread.start()
            threads.append(thread)

    for thread in threads:
        thread.join()

Running result

kali$ python3 test.py
on 1: 0
on 1: 0
on 2: 1
on 2: 1
on 3: 2
on 3: 2
on 4: 3
on 5: 4
on 6: 5
on 6: 5
on 7: 6
on 7: 6

What should I do to print the result once? Thanks!!!

Answer 1 · 2024-01-27T01:45:43.000Z

Hi,

That is caused by the multiple refreshes of alive_progress, you can't print from different threads simultaneously, which can cause havoc on the internal structures because they are not synchronized. And I can't synchronize them because that would add a lot of overhead in a super hot code.

Anyway, I think what you want is something like this:

import random, time
from concurrent import futures

def requester(url):
    # print(f'requesting: {url}')
    time.sleep(random.random())

with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(100) as bar:
    future_to_url = {executor.submit(requester, url): url for url in range(100)}
    for future in futures.as_completed(future_to_url):
        url = future_to_url[future]
        print(f'finished: {url}')
        bar()

If you uncomment that line everything breaks again.
If you really need to print while on other threads, you should send the lines through a queue, test within the main alive_bar loop if something is to be printed, and print them from there. Note the lines would probably get all randomly scrambled on the screen.

Does that solve it?

Answer 2 · 2024-01-27T01:50:18.000Z

This would work for printing:

import random, time, queue
from concurrent import futures

def requester(url):
    q.put(f'requesting: {url}')
    time.sleep(random.random())

q = queue.Queue()
with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(100) as bar:
    future_to_url = {executor.submit(requester, url): url for url in range(100)}
    for future in futures.as_completed(future_to_url):
        while not q.empty():
            print(q.get())
        url = future_to_url[future]
        print(f'finished: {url}')
        bar()

You'll get output like this:

on 0: requesting: 0
on 0: requesting: 1
on 0: requesting: 2
on 0: requesting: 3
on 0: finished: 0
on 1: requesting: 4
on 1: finished: 3
on 2: requesting: 5
on 2: finished: 4
on 3: requesting: 6
on 3: finished: 1
on 4: requesting: 7
on 4: finished: 2
on 5: requesting: 8
on 5: finished: 6
on 6: requesting: 9
on 6: finished: 7
on 7: requesting: 10
on 7: finished: 5
on 8: requesting: 11
...

Answer 3 · 2024-01-27T03:23:13.000Z

Hmm, having strange behaviour when printing from threads sounds like a recipe for disaster. Lots of code, often beyond a developer's control, might try to print in a separate thread.

Answer 4 · 2024-01-28T04:56:17.000Z

Actually, you're right @TheTechRobo.
I'll look into it a bit deeper.

Answer 5 · 2024-01-28T06:50:37.000Z

Found it, and fixed it 👍
Now the print hooks are correctly synchronized when printing from multiple threads.

I've also improved them a bit along the way. I've never noticed that an empty print() would output an empty "on X:".
Also, I've discovered that Python's str.splitlines() is kinda flawed. 'line'.splitlines() returns 1 line, but 'line\n'.splitlines() does not return 2, which breaks my header algorithm for newlines...

So, using these new synced print hooks, I've tested this code:

import random, time
from concurrent import futures

def requester(url):
    print(f'requesting: {url}\n', end='')  # this is printed from the thread pool.
    time.sleep(2*random.random())

with futures.ThreadPoolExecutor(max_workers=3) as executor, alive_bar(10) as bar:
    print()
    future_to_url = {executor.submit(requester, url): url for url in range(10)}
    for future in futures.as_completed(future_to_url):
        url = future_to_url[future]
        print(f'finished: {url}')  # this from the main thread.
        bar()

It was printing this:

   ...:
on 0:
on 0: requesting: 0
requesting: 1
requesting: 2
requesting: 3
finished: 0
on 1: requesting: 4
finished: 2
on 2: requesting: 5
finished: 3
on 3: requesting: 6
finished: 1
on 4: requesting: 7
finished: 4
on 5: requesting: 8
finished: 7
on 6: requesting: 9
finished: 6
on 7: finished: 5
on 8: finished: 9
on 9: finished: 8
|████████████████████████████████████████| 10/10 [100%] in 3.8s (2.53/s)

And now it will print this:

   ...:

on 0: requesting: 0
      requesting: 1
      requesting: 2
      requesting: 3
      finished: 0
on 1: requesting: 4
      finished: 2
on 2: requesting: 5
      finished: 1
on 3: requesting: 6
      finished: 4
on 4: requesting: 7
      finished: 3
on 5: requesting: 8
      finished: 6
on 6: requesting: 9
      finished: 5
on 7: finished: 8
on 8: finished: 9
on 9: finished: 7
|████████████████████████████████████████| 10/10 [100%] in 3.0s (3.37/s)

Answer 6 · 2024-01-28T07:03:27.000Z

Important to say that, thanks to the way that Python calls write on print, it is simply impossible to have 100% accurate lines printed.
Occasionally, if I run the code above several times, I get this:

    ...:

on 0: requesting: 0
      requesting: 1
      requesting: 2
      requesting: 3
      requesting: 4
      requesting: 5
      finished: 4
on 1: finished: 1
on 2: finished: 0
on 3: requesting: 6
      finished: 2requesting: 7
      requesting: 9
      requesting: 8
on 4: finished: 5
on 5: finished: 3
on 6: finished: 6
on 7: finished: 9
on 8: finished: 8
on 9: finished: 7
|████████████████████████████████████████| 10/10 [100%] in 0.0s (8656.14/s)

Unfortunately, Python calls write twice on a single print call. It calls it with the text being printed, then with the end param. The default print(text, end='\n') then makes the calls stdout.write(text) and stdout.write('\n').
And, since these two calls have no synchronization at all, my print hook might end up receiving text1, text2, \n, \n for two prints with text1 and text2 from different threads...
That's why finished: 2 and requesting: 7 ended up glued together, there's no way around it when multi-threading.

Answer 7 · 2024-01-29T04:29:41.000Z

@rsalmei Thanks! Referring to your code, it is currently running normally.

Answer 8 · 2024-02-10T04:09:16.000Z

You're welcome!