<p>你没有充分利用现代计算机的全部功能,它有多个中央处理单元!这是到目前为止最好的优化,因为这是<strong>CPU限制的</strong>。注意:对于I/O绑定操作,<a href="https://docs.python.org/3/library/threading.html" rel="nofollow">multithreading</a>(使用线程模块)是合适的。在</p>
<p>那么让我们看看python是如何使用<a href="https://docs.python.org/2/library/multiprocessing.html" rel="nofollow">multiprocessing module</a>(read comments)来轻松实现的:</p>
<pre><code>import hashlib
# you're sampling a string so you need sample, not 'choice'
from random import sample
import multiprocessing
# use a thread to synchronize writing to file
import threading
# open up to 4 processes per cpu
processes_per_cpu = 4
processes = processes_per_cpu * multiprocessing.cpu_count()
print "will use %d processes" % processes
longitud = 8
valores = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
# check on smaller ranges to compare before trying your range... :-)
RANGE = 200000
def enc(string):
m = hashlib.md5()
m.update(string.encode('utf-8'))
return m.hexdigest()
# we synchronize the results to be written using a queue shared by processes
q = multiprocessing.Manager().Queue()
# this is the single point where results are written to the file
# the file is opened ONCE (you open it on every iteration, that's bad)
def write_results():
with open('datos.txt', 'w') as f:
while True:
msg = q.get()
if msg == 'close':
break;
else:
f.write(msg)
# this is the function each process uses to calculate a single result
def calc_one(i):
s = ''.join(sample(valores, longitud))
md = enc(s)
q.put("%s %s\n" % (s, md))
# we start a process pool of workers to spread work and not rely on
# a single cpu
pool = multiprocessing.Pool(processes=processes)
# this is the thread that will write the results coming from
# other processes using the queue, so it's execution target is write_results
t = threading.Thread(target=write_results)
t.start()
# we use 'map_async' to not block ourselves, this is redundant here,
# but it's best practice to use this when you don't HAVE to block ('pool.map')
pool.map_async(calc_one, xrange(RANGE))
# wait for completion
pool.close()
pool.join()
# tell result-writing thread to stop
q.put('close')
t.join()
</code></pre>
<p>在这段代码中可能有更多的优化要做,但是对于任何像您现在这样的cpu受限的任务来说,一个主要的优化就是使用多处理。在</p>
<p><strong>注意:</strong>一个简单的文件写入优化是从队列中聚合一些结果并将它们一起写入(如果有许多CPU超过了单个写入线程的速度)</p>
<p><strong>注2</strong>:由于OP要检查内容的组合/排列,应该注意有一个模块可以完成这一点,它被称为<a href="https://docs.python.org/2/library/itertools.html" rel="nofollow">itertools</a>。在</p>