关于第四章第2节书中程序的疑问
Microndgt opened this issue · 1 comments
Microndgt commented
在第二小节如何做中,书中给了一段程序:
import concurrent.futures
import time
number_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def evaluate_item(x):
# 计算总和,这里只是为了消耗时间
result_item = count(x)
# 打印输入和输出结果
print ("item " + str(x) + " result " + str(result_item))
def count(number) :
for i in range(0, 10000000):
i=i+1
return i * number
if __name__ == "__main__":
# 顺序执行
start_time = time.clock()
for item in number_list:
evaluate_item(item)
print("Sequential execution in " + str(time.clock() - start_time), "seconds")
# 线程池执行
start_time_1 = time.clock()
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
for item in number_list:
executor.submit(evaluate_item, item)
print ("Thread pool execution in " + str(time.clock() - start_time_1), "seconds")
# 进程池
start_time_2 = time.clock()
with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
for item in number_list:
executor.submit(evaluate_item, item)
print ("Process pool execution in " + str(time.clock() - start_time_2), "seconds")
首先time.clock() 在UNIX系统上,它返回的应该是"进程时间",它是用秒表示的浮点数。对于第一个顺序执行和第二个多线程执行,应该是准确的,因为都在当前进程执行,统计时间也是当前进程执行的时间。但是对于第三个多进程执行,当前进程只起到调度作用,执行时间分布到了其他进程里,因此我认为统计的时间是有问题的。按照常理也不可能顺序执行时间是6秒,多进程就0.03秒,这个提升了近200倍。
其次,executor.submit应该是排定任务,但是没有具体执行,会返回一个Future,但是并不是立即执行(需要看pool中是否有可用线程或者进程),所以我认为书中给出的测试程序是存在问题的。按照模块文档中给出的例子,应该是这样:
import concurrent.futures
import time
number_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def evaluate_item(x):
# 计算总和,这里只是为了消耗时间
result_item = count(x)
# 打印输入和输出结果
return result_item
def count(number) :
for i in range(0, 10000000):
i=i+1
return i * number
if __name__ == "__main__":
# 顺序执行
start_time = time.time()
for item in number_list:
print(evaluate_item(item))
print("Sequential execution in " + str(time.time() - start_time), "seconds")
# 线程池执行
start_time_1 = time.time()
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(evaluate_item, item) for item in number_list]
for future in concurrent.futures.as_completed(futures):
print(future.result())
print ("Thread pool execution in " + str(time.time() - start_time_1), "seconds")
# 进程池
start_time_2 = time.time()
with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(evaluate_item, item) for item in number_list]
for future in concurrent.futures.as_completed(futures):
print(future.result())
print ("Process pool execution in " + str(time.time() - start_time_2), "seconds")
使用as_completed函数,可以保证等待所有Future对象运行完成,这时候统计的时间应该才是准确的。我的电脑CPU: Intel 酷睿i5 5257U,顺序执行和多线程在6.3秒,多进程在3.7秒。
P.S.
laixintao commented
@Microndgt 确实是这样,确认了一下,即使单独跑一次也是超过0.2s的
In [1]: import time
...: def count(number) :
...: time1 = time.time()
...: for i in range(0, 10000000):
...: i=i+1
...: time2 = time.time()
...: print time2 - time1
...: return i * number
...:
In [3]: count(1)
0.74836397171
Out[3]: 10000000
ps: 翻译中运算结果有些是我贴的我自己跑的结果。
我在原文中改成你的代码。