如何用Python解析PDF？

2条回答

网友

1楼 · 编辑于 2024-09-27 22:23:15

也许不是，或者至少不是完美的。您可以很容易地获取一个输入字符串_H_o_st_in_g_S_e_rv_ic_es_ln，删除所有下划线并在大写字母前面加空格。但是看起来你得到的文本不一定是正确的文本，这可能会影响你的输出。你知道吗

def add_space(st):
    out = []
    for ch in st:
       if ch.isupper():
           out.append(' ')
       out.append(ch)
    return ''.join(out)

print(add_space('_H_o_st_in_g_S_e_rv_ic_es_ln'.replace('_', '')))

输出

Hosting Servicesln

因为您的字符识别软件将Inc视为ln

网友

2楼 · 编辑于 2024-09-27 22:23:15

您可以使用以下代码实现您想要的：

import re

s = '_H_o_st_in_g_S_e_rv_ic_es_In_c'
s = s.strip('_')
res = s[0]

for c in s.split('_')[1:]:
    if c:
        if re.match('[A-Z]', c):
            res = res + ' ' + c
        else:
            res = res + c

输出：

>>> res
'Hosting Services Inc'

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何用Python解析PDF？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >