Python:在二进制文件（.PLM）中搜索unicode字符串

# Finds the filename in the .PLM-file def FindFileName(File): # Opens the file and points to byte 56, where the file name starts f = open(File,'rb') f.seek(56) Name = "" byte = f.read(1) # Reads the first byte after byte 56 while byte != "\x00": # Runs the loop, until a NUL-character is found (00 is NUL in hex) Name += str(byte) # Appends the current byte to the string Name byte = f.read(1) # reads the next byte f.close() return Name

def FindFileName(File): # Opens the file and points to byte 56, where the file name starts f = open(File,'rb') f.seek(56) Name = "" byte = f.read(1) # Reads the first byte after byte 56 while byte and (byte != "\x00"): # Runs the loop, until a NUL-character is found (00 is NUL in hex) # Since there are problems with "?" in directory names, we change those to spaces if byte == "?": Name += " " elif byte == "\xc5": Name += "å" elif byte == "\xd8": Name += "ø" else: Name += byte byte = f.read(1) # reads the next byte f.close() return Name.decode('mbcs')

1条回答

网友
1楼 · 发布于 2024-05-20 01:52:02

在python2中，从二进制文件读取将返回一个字符串，因此无需对其使用str。另外，如果由于某种原因文件格式不正确，并且其中没有零字节，read将返回一个空字符串。您可以通过对测试进行一个小的修改来检查这两种情况。你知道吗
while byte and (byte != "\x00"): # Runs the loop, until a NUL-character is found (00 is NUL in hex) Name += byte # Appends the current byte to the string Name byte = f.read(1) # reads the next byte
一旦获得完整的字节序列，就必须将其转换为Unicode字符串。为此，您需要解码：
Name = Name.decode("utf-8")
正如在注释中提到的，看起来您的字符串实际上并不是UTF-8，而是微软的一个代码页。您可以从Windows当前使用的代码页进行解码：
Name = Name.decode("mbcs")
您可以显式地指定要使用的代码页，请参见the documentation。你知道吗
尝试在控制台上打印字符串时可能会遇到问题，因为Windows控制台与系统的其余部分不使用相同的代码页；它可能没有需要打印的字符。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章