避免作者名拆分Python

2024-10-03 19:31:17 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在读一个PDF文件，并以分隔符（''）为基础拆分整个文本，但是PDF也包含这样的作者姓名

Similar to the work of Valenzuela et al. [1] and Zhu et al. [2], we use features like citations from citing to cited paper, citations per section, and author overlap.

我的代码把这一行分成3行

类似于Valenzuela等人
[1] 朱等人
[2] ，我们使用从引用到被引用论文的引文等功能，每节引文和作者重叠

这里是我的代码读取pdf文本和分裂它

from tika import parser
import re

rege x = re.compile(r'\[\d]')

objFile = parser.from_file('read.pdf')
text = objFile['content']
lstString = text.strip()
lstString = lstString.split(".")

有人能帮我怎样才能避免作者名字分裂吗

Tags： and to 代码 from 文本 import parser pdf

0条回答

目前没有回答

避免作者名拆分Python

相关问题更多 >

编程相关推荐

热门问题

热门文章

避免作者名拆分Python

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >