如何用python从网站读取txt文件问题的回答

如何用python从网站读取txt文件

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如果您想处理文件的前N行（这里是10行），而不将整个响应读入内存，下面介绍了如何做到这一点： <pre><code>import nltk import requests myfile = requests.get('http://www.sls.hawaii.edu/bley-vroman/brown.txt', stream=True).raw for i in range(0, 10): line = myfile.readline() data = line.decode().replace('\\n', 'r') print(data, end="") </code></pre> 结果: <blockquote> The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced "no evidence" that any irregularities took place. The jury further said in term-end presentments that the City Executive Committee, which had over-all charge of the election, "deserves the praise and thanks of the City of Atlanta" for the manner in which the election was conducted. The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible "irregularities" in the hard-fought primary which was won by </blockquote> 我解决的三个问题是： <ol> <li><code>requests.get()</code>不返回类似文件的对象。添加<code>.raw</code>以获得该请求，并将<code>stream=True</code>添加到请求中以使其正确操作</李> <li>您正在调用<code>read()</code>，一旦您寻址了#1，它就会工作，但会读取整个文件。那不是你想要的。我假设您想通过调用<code>readline()</code>逐行阅读</李> <li>必须先将传入的字节解码为文本，然后才能使用字符串方法对其进行操作。这就是<code>decode()</code>所做的</李> </ol> 当然，要处理10行而不是1行，您需要一个循环和一种只处理10行的方法。我也加了一句。我还添加了一个<code>print()</code>调用，以便我们都能看到结果 我假设代码中的<code>replace()</code>并不是您真正想要的。我猜你的意思是<code>replace('\\n', '\\r')</code>，但因为我不确定（我不知道这能给你带来什么），所以我把这件事留给你来处理。我确实对它进行了修复，这样它就不会通过在搜索词中添加第二个反斜杠来完全消除这一行（不知道它为什么这样做）

如何用python从网站读取txt文件

1 个回答

相关Python问题