<p>如果您想处理文件的前N行(这里是10行),而不将整个响应读入内存,下面介绍了如何做到这一点:</p>
<pre><code>import nltk
import requests
myfile = requests.get('http://www.sls.hawaii.edu/bley-vroman/brown.txt', stream=True).raw
for i in range(0, 10):
line = myfile.readline()
data = line.decode().replace('\\n', 'r')
print(data, end="")
</code></pre>
<p>结果:</p>
<blockquote>
<p>The Fulton County Grand Jury said Friday an investigation of
Atlanta's recent primary election produced "no evidence" that any
irregularities took place. The jury further said in term-end
presentments that the City Executive Committee, which had over-all
charge of the election, "deserves the praise and thanks of the City
of Atlanta" for the manner in which the election was conducted.</p>
<p>The September-October term jury had been charged by Fulton Superior
Court Judge Durwood Pye to investigate reports of possible
"irregularities" in the hard-fought primary which was won by</p>
</blockquote>
<p>我解决的三个问题是:</p>
<ol>
<li><code>requests.get()</code>不返回类似文件的对象。添加<code>.raw</code>以获得该请求,并将<code>stream=True</code>添加到请求中以使其正确操作</李>
<li>您正在调用<code>read()</code>,一旦您寻址了#1,它就会工作,但会读取整个文件。那不是你想要的。我假设您想通过调用<code>readline()</code>逐行阅读</李>
<li>必须先将传入的字节解码为文本,然后才能使用字符串方法对其进行操作。这就是<code>decode()</code>所做的</李>
</ol>
<p>当然,要处理10行而不是1行,您需要一个循环和一种只处理10行的方法。我也加了一句。我还添加了一个<code>print()</code>调用,以便我们都能看到结果</p>
<p>我假设代码中的<code>replace()</code>并不是您真正想要的。我猜你的意思是<code>replace('\\n', '\\r')</code>,但因为我不确定(我不知道这能给你带来什么),所以我把这件事留给你来处理。我确实对它进行了修复,这样它就不会通过在搜索词中添加第二个反斜杠来完全消除这一行(不知道它为什么这样做)</p>