在python源代码中查找非ascii bytestrings - 问答 - Python中文网

在python源代码中查找非ascii bytestrings

2024-10-04 07:25:55 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我所有的python源代码都是用utf-8编码的，并且在文件的顶部声明了这个编码。在

但有时unicode字符串前面的u会丢失。在

示例Umlauts = "üöä"

上面是一个包含非ascii字符的bytestring，这会带来麻烦（UnicodeDecodeError）。在

我尝试了pylint和python -3，但我没有得到警告。在

我搜索一种在bytestrings中自动查找非ascii字符的方法。在

我的源代码需要支持python2.6和python2.7。在

我得到了一个众所周知的错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

顺便说一句：这个问题只涉及python源代码，而不是从文件或套接字读取的字符串。在

解决方案

对于需要支持python2.6+的项目，我将使用__future__.unicode_literals
对于需要支持2.5的项目，我将使用thg435的解决方案（模块ast）

Tags：文件项目字符串 in 声明示例编码源代码

1条回答

网友

1楼 · 发布于 2024-10-04 07:25:55

当然，您希望使用python来实现这一点！在

import ast, re

with open("your_script.py") as fp:
    tree = ast.parse(fp.read())

for node in ast.walk(tree):
    if (isinstance(node, ast.Str) 
            and isinstance(node.s, str) 
            and  re.search(r'[\x80-\xFF]', node.s)):
        print 'bad string %r line %d col %d' % (node.s, node.lineno, node.col_offset)

请注意，这并不区分裸字符和转义非ascii字符（fuß和fu\xdf）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章