从包含十六进制字节的文件读取str字符并解码?

2024-10-02 00:30:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件example.log,其中包含:

<POOR_IN200901UV xmlns="urn:hl7-org:v3" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ITSVersion="XML_1.0"
xsi:schemaLocation="urn:hl7-org:v3
../../Schemas/POOR_IN200901UV20.xsd">\n\t<!-- \xe6\xb6\x88\xe6\x81\xafID -
->\n\t<id extension="BS002"/>

我想读取文件并将str转换为utf-8编码格式,然后写入新文件。目前我的代码如下:

with open("example_decoded.log", 'w') as f:
    for line in open("example.log", 'r', encoding='utf-8'):
        m = re.search("<POOR_IN200901UV", line)
        if m:
            line = line[m.start():-2]
            line_bytes = bytes(line, encoding='raw_unicode_escape')
            line_decoded = line_bytes.decode('utf-8')
            print(line_decoded)
            f.write(line_decoded)
        else:
            pass

但是example_decoded.log的内容:

<POOR_IN200901UV xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ITSVersion="XML_1.0" 
xsi:schemaLocation="urn:hl7-org:v3 
../../Schemas/POOR_IN200901UV20.xsd">\n\t<!-- \xe6\xb6\x88\xe6\x81\xafID -
->\n\t<id extension="BS002"

{}部分没有被解码,所以我想知道如何处理这个混合类型str解码问题


Tags: 文件orglogexamplelinev3utfdecoded
3条回答

请参阅:Read hex characters and convert them to utf-8 using python 3

解决办法是:

with open("example_decoded.log", 'w') as f:
    for line in open("example.log", 'r', encoding='utf-8'):
    m = re.search("<POOR_IN200901UV", line)
    if m:
        line = line[m.start():-2]
        line_decoded = bytes(line, 'utf-8').decode('unicode_escape').encode('latin-1').decode('utf8')
        print(line_decoded)
        f.write(line_decoded)
    else:
        pass

虽然我不明白为什么encode('latin-1')首先,
有人能解释一下吗

decodedVal = struct.unpack(">f", bytes.fromhex(encdoded_val))[0]

请参阅下面的链接,以添加您的endian和类型,而不是">f"

https://docs.python.org/3/library/struct.html

import codecs

decode_hex = codecs.getdecoder("hex_codec")

string = decode_hex(string)[0]

https://docs.python.org/3/library/codecs.html

相关问题 更多 >

    热门问题