我使用pdfminer将pdf转换为txt。问题是pdf中的pdfminer在行结束后添加了\n,但句子并没有在那里结束。你可以看到,在下面的文本中,每一行都被当作一个不正确的句子。我还提供了其他版本的文本来显示新行字符的位置。比如说
quan-
tum population.
应该是一句话。因此,我将\n替换为“并解决了此问题。但其他\n也将替换为我不想替换的内容
Balanced Quantum Classical Evolutionary Algorithm(BQCEA)
Muhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer Beg
Abstract
With advancement in Quantum computing, classical algorithms are adapted and integrated
with Quantum properties such as qubit representation and entanglement. Although these
properties perform better however pre-mature convergence is the main issue in Quantum
Evolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-
tum population. In this paper, we introduced a new way to update the quantum population
of QEA to avoid premature convergence
'Balanced Quantum Classical Evolutionary Algorithm(BQCEA)\n\nMuhammad Shahid, Hasan Mujtaba,
Muhammad Asim, Omer Beg\n\nAbstract\nWith advancement in Quantum computing, classical
algorithms are adapted and integrated\nwith Quantum properties such as qubit representation
and entanglement', ' Although these\nproperties perform better however pre-mature
convergence is the main issue in Quantum\nEvolutionary Algorithms(QEA) because QEA uses only
the best individual to update quan-\ntum population', ' In this paper, we introduced a new
way to update the quantum population\nof QEA to avoid premature convergence',
我试过这个代码
lines =tokenize.sent_tokenize(txt_str)
for l in lines:
s = l.replace('\n', '')
print(s)
这就导致了这种情况
Balanced Quantum Classical Evolutionary Algorithm(BQCEA)Muhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer BegAbstractWith advancement in Quantum computing, classical algorithms are adapted and integratedwith Quantum properties such as qubit representation and entanglement.
Although theseproperties perform better however pre-mature convergence is the main issue in QuantumEvolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-tum population.
In this paper, we introduced a new way to update the quantum populationof QEA to avoid premature convergence.
但这不是想要的文本。我想要这个版本的文本
Balanced Quantum Classical Evolutionary Algorithm(BQCEA)
Muhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer Beg
Abstract
With advancement in Quantum computing, classical algorithms are adapted and integrated with Quantum properties such as qubit representation and entanglement. Although these properties perform better however pre-mature convergence is the main issue in Quantum Evolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-tum population. In this paper, we introduced a new way to update the quantum population of QEA to avoid premature convergence
我不想让空行消失,我希望你能理解
要回答这个问题,每一行都必须与它后面的行一起考虑。这里的规则是,如果以下所有条件都适用,则删除换行符:
这个问题可以通过使用一个生成函数方便地解决,该生成函数一次生成一对连续的线。这将在文件上进行迭代的逻辑与决定何时删除换行符的逻辑完全分离
给出:
注意,每行的字数计算两次。为了获得更高的效率,可以通过改变
iter_lines
在读入每一行时调用num_words_line
来解决这一问题,但代价是代码稍微多一些,并生成对中每一行的长度以及行本身。但是iter_lines
和fix_newlines
之间的逻辑分离就不那么清晰了可以使用稍微不同的版本来操作内存中的字符串,而不是读写文件:
lines=tokenize.sent\u tokenize(txt\u str)
s=行。替换('\n','')
印刷品
你可以试试这个。看演示
https://regex101.com/r/crj3aD/1
Python脚本:
还有更多的条件
在你的另一种情况下试试这个
https://regex101.com/r/crj3aD/2
相关问题 更多 >
编程相关推荐