<p>我只是报告我的发现。性能差异似乎不是来自<code>str.count()</code>函数。我更改了您的代码并将<code>str.count()</code>重构为自己的函数。我还将您的全局代码放入一个main函数中。以下是我的代码版本:</p>
<pre><code>import os
import time
import random as rd
import string
import timeit
# Function to create random data in a specific pattern with separator ";":
def createRandomString(num,io,fullLength):
lineFull = ''
nl = True
randstr = ''.join(rd.choice(string.ascii_letters) for _ in range(7))
#randstr = ''.join(rd.choice(string.printable) for _ in range(7))
for i in range(num):
if i == 0:
line = 'Start;'
else:
line = ''
bb = rd.choice([True,True,False])
if bb:
line = line+'\"\";'
else:
if rd.random() < 0.999:
line = line+randstr
else:
line = line+rd.randint(10,100)*randstr
if nl and i != num-1:
line = line+';\n'
nl = False
elif rd.random() < 0.04 and i != num-1:
line = line+';\n'
if rd.random() < 0.01:
add = rd.randint(1,10)*'\n'
line = line+add
else:
line = line+';'
lineFull = lineFull+line
return lineFull+'\n'
def counting_func(lines_iter):
try:
return lines_iter.next().count(';')
except StopIteration:
return -1
def wrapper(func, *args, **kwargs):
def wrapped():
return func(*args, **kwargs)
return wrapped
# Create file with random data:
def main():
fullLength = 100000
outputFolder = ""
numberOfCols = 38
testLines = [createRandomString(numberOfCols,i,fullLength) for i in range(fullLength)]
with open(outputFolder+"TestFile.txt",'w') as tf:
tf.writelines(testLines)
# Read in file:
with open(outputFolder+"TestFile.txt",'r') as ff:
lines = []
for line in ff.readlines():
lines.append(unicode(line.rstrip('\n')))
# Restore columns by counting the separator:
lines_iter = iter(lines)
print timeit.timeit(wrapper(counting_func, lines_iter), number=fullLength)
if __name__ == '__main__': main()
</code></pre>
<p>每生成一行测试100000次。对于<code>string.ascii_letters</code>,我从timeit得到每个循环的平均<code>0.0454177856445</code>秒。用<code>string.printable</code>,我平均得到<code>0.0426299571991</code>。事实上,后者比前者稍快一些,尽管差别不大。你知道吗</p>
<p>我怀疑性能的差异来自于除了计数之外,您在以下循环中所做的工作:</p>
<pre><code>for i in range(len(lines)):
linesT = linesT + lines[i]
count = linesT.count(';')
if count == numberOfCols:
lines2.append(linesT)
linesT = ''
if i%1000 == 0:
print time.time()-time0
time0 = time.time()
</code></pre>
<p>另一种可能是在没有主函数的情况下访问全局变量的速度变慢。但这两种情况都应该发生,所以不是真的。你知道吗</p>