在Python中读取多个文件时，起始字节无效

2024-07-03 03:31:36 发布

男 | 程序猿一只，喜欢编程写python代码。

我的函数读取多个.sgm文件。我从文件中读取内容时出错，特别是在第contents = f.read()行

def block_reader(path):
    filePaths = []
    for filename in os.listdir(path):
        if filename.endswith(".sgm"):
            filePaths.append(os.path.join(path, filename))
            continue
        else:
            continue

    for file in filePaths:
        with open(file, 'r') as f:
            print(f)
            contents = f.read()
            soup = BeautifulSoup(contents, "lxml")

    return ["test content"]

错误消息

    Traceback (most recent call last):
  File "./block-1-reader.py", line 32, in <module>
    for reuters_file_content in solutions.block_reader(path):
  File "/home/ragith/Documents/A-School/Fall-2020/COMP_479/Assignment_1/solutions.py", line 29, in block_reader
    contents = f.read()
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1519554: invalid start byte

Tags： path in py for read line contents filename

1条回答

网友

1楼 · 发布于 2024-07-03 03:31:36

试试这个：with open(path, 'rb') as f:在open（）中的模式说明符中的b声明该文件应被视为二进制文件，因此内容将保持一个字节。这样就不会发生解码尝试。更多详细信息，请访问：this link

在Python中读取多个文件时，起始字节无效

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中读取多个文件时，起始字节无效

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >