python3，pythonic方法来获取字节的字符串表示形式？

3条回答

网友

1楼 · 编辑于 2024-06-26 18:04:48

因为你很高兴在评论中提到了你的实际问题，我将再次更新我的答案，以回应这个问题。原始答案见下文。在

It's the string I post to Github Markdown API. This is the only way that unicode character can be accepted. I got the rendered html with the orignal character dada大大

GitHub Markdown API要求您以JSON格式发送数据。JSON本身借用了JavaScript中的字符串转义，对于这个字符，它将是\u5927。但是，在使用^{} module时，您根本不需要担心这个问题：

from urllib import urlopen
import json

text = 'dada大大'
data = json.dumps({ mode: 'markdown', 'text': text }).encode()
r = urlopen('https://api.github.com/markdown', data)

print(r.read().decode()) # <p>dada大大</p>

如您所见，API接受编码文本没有问题，并且正确地生成正确的输出，而不必担心编码。在

或者将原始API与requests库一起使用时：

^{pr2}$

原始答案

>>> a = 'dada大大'.encode('utf-8')
>>> a
b'dada\xe5\xa4\xa7\xe5\xa4\xa7'
>>> str(a)
"b'dada\\xe5\\xa4\\xa7\\xe5\\xa4\\xa7'"
>>> str(a)[2:-1]
'dada\\xe5\\xa4\\xa7\\xe5\\xa4\\xa7'
>>> print(_)
dada\xe5\xa4\xa7\xe5\xa4\xa7

当您只执行str(a)操作时，您将得到字节字符串的字符串表示形式。当然，当您在解释器中这样使用它时，解释器实际上会调用repr来显示它。而包含反斜杠的字符串将把它们作为\\进行转义。那就是它们的来源。在

最后，必须去掉b'和尾随的'来获得字节字符串的字符串表示的内容。在

旁注：str()和{}在bytes对象上使用时会产生相同的结果。

According to Poke's answer, what I need is preventing autoescaping of repr.

不，你没有。最后一个字符串中没有双反斜杠。它们之所以出现，是因为当您在REPL中输入stuff时，它会在调用repr之后将这些东西的返回值输出到控制台。但这并不意味着，实际的弦突然改变了：

>>> s = str(a)[2:-1]
>>> len(s)
28
>>> list(s)
['d', 'a', 'd', 'a', '\\', 'x', 'e', '5', '\\', 'x', 'a', '4', '\\', 'x', 'a', '7', '\\', 'x', 'e', '5', '\\', 'x', 'a', '4', '\\', 'x', 'a', '7']

如您所见，字符串中没有双反斜杠。是的，您可以再次看到它们，但这仅仅是因为REPL正在打印list(s)的返回值。列表中的每一项都是一个字符，包括反斜杠。它们只是再次转义，因为'\'不是有效的字符串。在

>>> '\'
SyntaxError: EOL while scanning string literal
>>> '\\'
'\\'
>>> len('\\')
1

网友

2楼 · 编辑于 2024-06-26 18:04:48

好的，最后我找到了解决方案，它来自Python Replace \\ with \

a = 'dada大大'.encode('utf-8')
b = str(a)[2:-1].encode('utf-8').decode('unicode_escape')

也许我应该解释清楚我想要什么。在

编辑-我的测试结果

^{pr2}$

网友

3楼 · 编辑于 2024-06-26 18:04:48

bytes实际上是一个整数数组：

>>> a = 'dada大大'.encode() # 'utf-8' by default
>>> list(a)
[100, 97, 100, 97, 229, 164, 167, 229, 164, 167]

您可以使用

^{pr2}$

因此

>>> list(chr(x) if x < 128 else hex(x) for x in a)
['d', 'a', 'd', 'a', '0xe5', '0xa4', '0xa7', '0xe5', '0xa4', '0xa7']

>>> print("".join(chr(x) if x < 128 else hex(x).replace("0", "\\") for x in a))
dada\xe5\xa4\xa7\xe5\xa4\xa7

原始答案

编辑-我的测试结果

相关问题更多 >

编程相关推荐

热门问题

热门文章