<p>这里有一种方法可以做到这一点,而不必一次将整个文件读入内存。它确实需要先读取整个文件,但只存储每行的起始位置。一旦知道了这一点,它就可以使用<code>seek()</code>方法以所需的任何顺序随机访问每一个。你知道吗</p>
<p>下面是使用输入文件的示例:</p>
<pre><code># Preprocess - read whole file and note where lines start.
# (Needs to be done in binary mode.)
with open('text_file.txt', 'rb') as file:
offsets = [0] # First line is always at offset 0.
for line in file:
offsets.append(file.tell()) # Append where *next* line would start.
# Now reread lines in file in reverse order.
with open('text_file.txt', 'rb') as file:
for index in reversed(range(len(offsets)-1)):
file.seek(offsets[index])
size = offsets[index+1] - offsets[index] # Difference with next.
# Read bytes, convert them to a string, and remove whitespace at end.
line = file.read(size).decode().rstrip()
print(line)
</code></pre>
<p>输出:</p>
<pre class="lang-none prettyprint-override"><code>2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
</code></pre>
<p><strong>更新</p>
<p>这里有一个版本可以做同样的事情,但是使用Python的<a href="https://docs.python.org/3/library/mmap.html#module-mmap" rel="nofollow noreferrer">^{<cd2>}</a>模块来<a href="https://en.wikipedia.org/wiki/Memory-mapped_file" rel="nofollow noreferrer">memory-map</a>文件,它应该通过利用OS/硬件的虚拟内存功能来提供更好的性能。你知道吗</p>
<p>这是因为,正如<a href="https://pymotw.com/3/mmap/index.html" rel="nofollow noreferrer">PyMOTW-3</a>所说:</p>
<blockquote>
<p>Memory-mapping typically improves I/O performance because it does not involve a separate system call for each access and it does not require copying data between buffers – the memory is accessed directly by both the kernel and the user application.</p>
</blockquote>
<p>代码:</p>
<pre><code>import mmap
with open('text_file.txt', 'rb') as file:
with mmap.mmap(file.fileno(), length=0, access=mmap.ACCESS_READ) as mm_file:
# First preprocess the file and note where lines start.
# (Needs to be done in binary mode.)
offsets = [0] # First line is always at offset 0.
for line in iter(mm_file.readline, b""):
offsets.append(mm_file.tell()) # Append where *next* line would start.
# Now process the lines in file in reverse order.
for index in reversed(range(len(offsets)-1)):
mm_file.seek(offsets[index])
size = offsets[index+1] - offsets[index] # Difference with next.
# Read bytes, convert them to a string, and remove whitespace at end.
line = mm_file.read(size).decode().rstrip()
print(line)
</code></pre>