<p>我在理解python2的<code>foo.decode("hex")</code>命令时遇到了一些问题。通过求解<a href="https://cryptopals.com/sets/1/challenges/4" rel="nofollow noreferrer">this problem</a>,我在python2.7.12中获得了以下结果(其中_阿尔法.txt是一个4 MB的字典)。你知道吗</p>
<pre><code>words = open("words_alpha.txt").read().split('\n')
def xor(x, y):
if len(x) == len(y):
return "".join([chr(ord(x[i]) ^ ord(y[i])) for i in range(len(x))])
def single_char_xors(msg):
for i in range(128):
yield [chr(i), xor(msg, chr(i)*len(msg))]
def real_word_count(S): # Assumes there is at least one three-letter word in the string S.
count = 0
for word in filter(lambda s: s.isalpha() and len(s) >= 3, S.split(' ')):
if word.lower() in words:
count += 1
return count
hexes = open("4.txt").read().split('\n')
hexes = [x.decode("hex") for x in hexes]
answer = []
maxwc = 0
for x in hexes:
for y in single_char_xors(x):
if real_word_count(y[1]) > maxwc:
answer = [x] + y
maxwc = real_word_count(y[1])
print answer[0] + " xor " + answer[1] + " is " + answer[2]
</code></pre>
<p>在python3中,<code>foo.decode("hex")</code>被弃用。但是用<code>hexes = [binascii.unhexlify(x).decode() for x in hexes]</code>替换<code>hexes = [x.decode("hex") for x in hexes]</code>会</p>
<blockquote>
<p>UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 3: invalid continuation byte</p>
</blockquote>
<p>而<code>hexes = [binascii.unhexlify(x).decode("utf-8", "ignore") for x in hexes]</code>(或<code>"replace"</code>、<code>"backslashreplace"</code>等)工作正常。那么<code>foo.decode("hex")</code>在做什么<code>binascii.unhexlify(foo).decode()</code>在默认情况下不做的事情呢?你知道吗</p>