Python中的循环效率

import hashlib from random import choice longitud = 8 valores = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" def enc(string): m = hashlib.md5() m.update(string.encode('utf-8')) return m.hexdigest() def code(): p = "" p = p.join([choice(valores) for i in xrange(longitud)]) text = p return text i = 1 for i in xrange(2000000000000000000): cod = code() md = enc(cod) print cod print md i += 1 print i f=open('datos.txt','a') f.write("%s " % cod) f.write("%s" % md) f.write('\n') f.close()

3条回答

网友

1楼 · 编辑于 2024-09-27 21:32:23

请注意，您应该使用

for cod in itertools.product(valores, longitud):

而不是通过random.sample选择字符串，因为它只访问一次给定的字符串。在

还要注意，对于给定的值，这个循环有218340105584896次迭代。输出文件将占用9170284434565632字节或8PB。在

网友

2楼 · 编辑于 2024-09-27 21:32:23

你没有充分利用现代计算机的全部功能，它有多个中央处理单元！这是到目前为止最好的优化，因为这是CPU限制的。注意：对于I/O绑定操作，multithreading（使用线程模块）是合适的。在

那么让我们看看python是如何使用multiprocessing module（read comments）来轻松实现的：

import hashlib
# you're sampling a string so you need sample, not 'choice'
from random import sample
import multiprocessing
# use a thread to synchronize writing to file
import threading

# open up to 4 processes per cpu
processes_per_cpu = 4
processes = processes_per_cpu * multiprocessing.cpu_count()
print "will use %d processes" % processes
longitud = 8
valores = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
# check on smaller ranges to compare before trying your range... :-)
RANGE = 200000
def enc(string):
    m = hashlib.md5()
    m.update(string.encode('utf-8'))
    return m.hexdigest()

# we synchronize the results to be written using a queue shared by processes
q = multiprocessing.Manager().Queue()

# this is the single point where results are written to the file
# the file is opened ONCE (you open it on every iteration, that's bad)
def write_results():
    with open('datos.txt', 'w') as f:
        while True:
            msg = q.get()
            if msg == 'close':
                break;
            else:
                f.write(msg)

# this is the function each process uses to calculate a single result
def calc_one(i):
    s = ''.join(sample(valores, longitud))
    md = enc(s)
    q.put("%s %s\n" % (s, md))

# we start a process pool of workers to spread work and not rely on
# a single cpu
pool = multiprocessing.Pool(processes=processes)

# this is the thread that will write the results coming from
# other processes using the queue, so it's execution target is write_results
t = threading.Thread(target=write_results)
t.start()
# we use 'map_async' to not block ourselves, this is redundant here,
# but it's best practice to use this when you don't HAVE to block ('pool.map')
pool.map_async(calc_one, xrange(RANGE))
# wait for completion
pool.close()
pool.join()
# tell result-writing thread to stop
q.put('close')
t.join()

在这段代码中可能有更多的优化要做，但是对于任何像您现在这样的cpu受限的任务来说，一个主要的优化就是使用多处理。在

注意：一个简单的文件写入优化是从队列中聚合一些结果并将它们一起写入（如果有许多CPU超过了单个写入线程的速度）

注2：由于OP要检查内容的组合/排列，应该注意有一个模块可以完成这一点，它被称为itertools。在

网友

3楼 · 编辑于 2024-09-27 21:32:23

首先评测您的程序（使用cProfile模块：https://docs.python.org/2/library/profile.html和http://ymichael.com/2014/03/08/profiling-python-with-cprofile.html），但我敢打赌您的程序是IO绑定的（如果您的CPU使用率从未达到一个内核的100%，这意味着您的硬盘驱动器太慢，无法跟上程序其余部分的执行速度）。在

考虑到这一点，请从更改程序开始，以便：

它在循环的外部打开和关闭文件（打开和关闭文件非常慢）。在
它在每个迭代中只进行一个write调用（这些调用每个都转换成一个syscall，代价很高），如下：f.write("%s %s\n" % (cod, md))

相关问题更多 >

编程相关推荐

热门问题

热门文章