擅长:python、mysql、java
<p>给定一个文件对象和若干字符,可以使用:</p>
<pre><code># build a table mapping lead byte to expected follow-byte count
# bytes 00-BF have 0 follow bytes, F5-FF is not legal UTF8
# C0-DF: 1, E0-EF: 2 and F0-F4: 3 follow bytes.
# leave F5-FF set to 0 to minimize reading broken data.
_lead_byte_to_count = []
for i in range(256):
_lead_byte_to_count.append(
1 + (i >= 0xe0) + (i >= 0xf0) if 0xbf < i < 0xf5 else 0)
def readUTF8(f, count):
"""Read `count` UTF-8 bytes from file `f`, return as unicode"""
# Assumes UTF-8 data is valid; leaves it up to the `.decode()` call to validate
res = []
while count:
count -= 1
lead = f.read(1)
res.append(lead)
readcount = _lead_byte_to_count[ord(lead)]
if readcount:
res.append(f.read(readcount))
return (''.join(res)).decode('utf8')
</code></pre>
<p>测试结果:</p>
^{pr2}$