如何用python从网站读取txt文件

import nltk with open('http://www.sls.hawaii.edu/bley-vroman/brown.txt', 'r')as myfile: data=myfile.read().replace('\n', 'r') data2 = data.replace("/", "") for i, in line in enummerate(data2.split('\n')): if i>10: break print(str(i) + ':\t' + line)

Traceback (most recent call last): File "tut1.py", line 3, in <module> with open('http://www.sls.hawaii.edu/bley-vroman/brown.txt', 'r')as myfile: FileNotFoundError: [Errno 2] No such file or directory: 'http://www.sls.hawaii.edu/bley-vroman/brown.txt'

3条回答

网友

1楼 · 编辑于 2024-09-28 03:18:00

有^{}允许您逐行使用流媒体内容：

resp = requests.get('http://www.sls.hawaii.edu/bley-vroman/brown.txt', stream=True)
for i, l in enumerate(resp.iter_lines()):
    if i < 10:
        print(l)  # use l.decode() to get string
    else:
        break
resp.close()  # to not hang connection anymore

或者更简单：

for _, l in zip(range(10), resp.iter_lines()):
    print(l)  # use l.decode() to get string

或是最好的

from itertools import islice

print(*islice(resp.iter_lines(), 10), sep="\n")

网友

2楼 · 编辑于 2024-09-28 03:18:00

您可以访问该.txt文件的内容，而不会出现如下错误：

import requests

myfile = requests.get('http://www.sls.hawaii.edu/bley-vroman/brown.txt')

data = myfile.text

网友

3楼 · 编辑于 2024-09-28 03:18:00

如果您想处理文件的前N行（这里是10行），而不将整个响应读入内存，下面介绍了如何做到这一点：

import nltk
import requests

myfile = requests.get('http://www.sls.hawaii.edu/bley-vroman/brown.txt', stream=True).raw

for i in range(0, 10):
    line = myfile.readline()
    data = line.decode().replace('\\n', 'r')
    print(data, end="")

结果:

The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced "no evidence" that any irregularities took place. The jury further said in term-end presentments that the City Executive Committee, which had over-all charge of the election, "deserves the praise and thanks of the City of Atlanta" for the manner in which the election was conducted.
The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible "irregularities" in the hard-fought primary which was won by

我解决的三个问题是：

requests.get()不返回类似文件的对象。添加.raw以获得该请求，并将stream=True添加到请求中以使其正确操作
您正在调用read()，一旦您寻址了#1，它就会工作，但会读取整个文件。那不是你想要的。我假设您想通过调用readline()逐行阅读
必须先将传入的字节解码为文本，然后才能使用字符串方法对其进行操作。这就是decode()所做的

当然，要处理10行而不是1行，您需要一个循环和一种只处理10行的方法。我也加了一句。我还添加了一个print()调用，以便我们都能看到结果

我假设代码中的replace()并不是您真正想要的。我猜你的意思是replace('\\n', '\\r')，但因为我不确定（我不知道这能给你带来什么），所以我把这件事留给你来处理。我确实对它进行了修复，这样它就不会通过在搜索词中添加第二个反斜杠来完全消除这一行（不知道它为什么这样做）

相关问题更多 >

编程相关推荐

热门问题

热门文章