擅长:python、mysql、java
<p>这里有一个可行但不是很优雅的解决方案</p>
<pre><code># Read in file as a raw byte-string
fn = 'bad_chars.txt'
with open(fn, 'rb') as f:
text = f.read()
print(text)
# Detect out of range
has_bad = False
for c in text:
if c >= 128:
has_bad = True
print('Had bad:', has_bad)
# Fix offending characters
text = text.replace(b'\xc2\x92', b"\x27")
text = text.replace(b'\xc2\x85', b"...")
text = text.decode('utf-8')
print(text)
</code></pre>
<p>这将产生以下输出</p>
<pre><code>b'# ::snt That\xc2\x92s what we\xc2\x92re with\xc2\x85You\xc2\x92re not sittin\xc2\x92 there in a back alley and sayin\xc2\x92 hey what do you say, five bucks?\n'
Had bad: True
# ::snt That's what we're with...You're not sittin' there in a back alley and sayin' hey what do you say, five bucks?
</code></pre>
<p>缺点是我需要找到有问题的字符,并编写一个<code>replace</code>命令使其工作。类似问题中的可能替换代码表位于<a href="https://stackoverflow.com/questions/6609895/efficiently-replace-bad-characters">efficiently replace bad characters</a></p>