擅长:python、mysql、java
<p>这似乎是字节码字符串表示的字符串,您可以将其解码为utf-8。我们使用来自<code>ast</code>的<code>literal_eval</code>进行安全评估</p>
<p>这将使您在大部分的过程中达到目标,OCR中的异常情况,如<code>i)</code>,您需要通过替换来手动修复</p>
<pre><code>import ast
extracted = [
"b'i)\\nSYRUP\\na\\n\\x0c'",
"b'mi.\\n\\x0c'",
"b'100\\n\\x0c'",
"b'Te eT ran\\nSYRUP\\n\\x0c'",
"b'tamol, Ambroxol k\\n\\x0c'",
"b'Guaiphenesin\\n\\x0c'",
"b'Syrup\\n\\x0c'",
"b'ol HCl &\\n\\x0c'",
"b'quantity.\\n\\x0c'"]
def fix_string(s):
eval_str = ast.literal_eval(s)
dec_str = eval_str.decode('utf-8')
fix_str = dec_str.strip().replace('\n', ' ')
return fix_str
for e in extracted:
print(fix_string(e))
</code></pre>
<p>输出:</p>
<pre><code>i) SYRUP a
mi.
100
Te eT ran SYRUP
tamol, Ambroxol k
Guaiphenesin
Syrup
ol HCl &
quantity.
</code></pre>