如何仅替换\n之后有一些字符的

2024-09-28 01:28:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用pdfminer将pdf转换为txt。问题是pdf中的pdfminer在行结束后添加了\n,但句子并没有在那里结束。你可以看到,在下面的文本中,每一行都被当作一个不正确的句子。我还提供了其他版本的文本来显示新行字符的位置。比如说

quan-
tum population.

应该是一句话。因此,我将\n替换为“并解决了此问题。但其他\n也将替换为我不想替换的内容

Balanced Quantum Classical Evolutionary Algorithm(BQCEA)

Muhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer Beg

Abstract
With advancement in Quantum computing, classical algorithms are adapted and integrated
with Quantum properties such as qubit representation and entanglement. Although these
properties perform better however pre-mature convergence is the main issue in Quantum
Evolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-
tum population. In this paper, we introduced a new way to update the quantum population
of QEA to avoid premature convergence

'Balanced Quantum Classical Evolutionary Algorithm(BQCEA)\n\nMuhammad Shahid, Hasan Mujtaba, 
Muhammad Asim, Omer Beg\n\nAbstract\nWith advancement in Quantum computing, classical 
algorithms are adapted and integrated\nwith Quantum properties such as qubit representation 
and entanglement', ' Although these\nproperties perform better however pre-mature 
convergence is the main issue in Quantum\nEvolutionary Algorithms(QEA) because QEA uses only 
the best individual to update quan-\ntum population', ' In this paper, we introduced a new 
way to update the quantum population\nof QEA to avoid premature convergence',

我试过这个代码

lines =tokenize.sent_tokenize(txt_str)
for l in lines:
    s = l.replace('\n', '')
    print(s)

这就导致了这种情况

Balanced Quantum Classical Evolutionary Algorithm(BQCEA)Muhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer BegAbstractWith advancement in Quantum computing, classical algorithms are adapted and integratedwith Quantum properties such as qubit representation and entanglement.
Although theseproperties perform better however pre-mature convergence is the main issue in QuantumEvolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-tum population.
In this paper, we introduced a new way to update the quantum populationof QEA to avoid premature convergence.

但这不是想要的文本。我想要这个版本的文本

Balanced Quantum Classical Evolutionary Algorithm(BQCEA)

Muhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer Beg

Abstract
With advancement in Quantum computing, classical algorithms are adapted and integrated with Quantum properties such as qubit representation and entanglement. Although these properties perform better however pre-mature convergence is the main issue in Quantum Evolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-tum population. In this paper, we introduced a new way to update the quantum population of QEA to avoid premature convergence

我不想让空行消失,我希望你能理解


Tags: andthetoinupdatepropertiespopulationquantum
3条回答

要回答这个问题,每一行都必须与它后面的行一起考虑。这里的规则是,如果以下所有条件都适用,则删除换行符:

  • 它不是文件的最后一行
  • 该行至少包含2个单词
  • 下一行至少包含一个单词

这个问题可以通过使用一个生成函数方便地解决,该生成函数一次生成一对连续的线。这将在文件上进行迭代的逻辑与决定何时删除换行符的逻辑完全分离

#!/usr/bin/env python

def num_words_line(line):
    return len(line.split())

def iter_lines(input_file):
    """
    yields pairs of adjacent lines
    """
    with open(input_file) as f:
        previous = next(f)
        for line in f:
            yield (previous, line)
            previous = line
        yield (line, None)


def fix_newlines(input_file, output_file):
    with open(output_file, "w") as fout:
        for line, next_line in iter_lines(input_file):
            if (next_line != None and
                num_words_line(line) > 1 and
                num_words_line(next_line) > 0):
                line = line.replace("\n", " ")
            fout.write(line)


if __name__ == '__main__':
    fix_newlines("input.txt", "output.txt")

给出:

Balanced Quantum Classical Evolutionary Algorithm(BQCEA)

Muhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer Beg

Abstract
With advancement in Quantum computing, classical algorithms are adapted and integrated with Quantum properties such as qubit representation and entanglement. Although these properties perform better however pre-mature convergence is the main issue in Quantum Evolutionary Algorithms(QEA) because QEA uses only the best individual to update quan- tum population. In this paper, we introduced a new way to update the quantum population of QEA to avoid premature convergence

注意,每行的字数计算两次。为了获得更高的效率,可以通过改变iter_lines在读入每一行时调用num_words_line来解决这一问题,但代价是代码稍微多一些,并生成对中每一行的长度以及行本身。但是iter_linesfix_newlines之间的逻辑分离就不那么清晰了

可以使用稍微不同的版本来操作内存中的字符串,而不是读写文件:

#!/usr/bin/env python

def num_words_line(line):
    return len(line.split())


def iter_lines(input_string):
    """
    yields pairs of adjacent lines
    """
    iterator = iter(input_string.strip().split("\n"))
    previous = next(iterator)
    for line in iterator:
        yield (previous, line)
        previous = line
    yield (line, None)


def fix_newlines(input_string, from_file=True):
    output = ''
    for line, next_line in iter_lines(input_string):
        newline = not (next_line != None and
                       num_words_line(line) > 1 and
                       num_words_line(next_line) > 0)
        output += line
        if newline:
            output += "\n"
        else:
            output += " "
    return output

if __name__ == '__main__':

    input_text = ['Balanced Quantum Classical Evolutionary Algorithm(BQCEA)\n\nMuhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer Beg\n\nAbstract\nWith advancement in Quantum computing, classical algorithms are adapted and integrated\nwith Quantum properties such as qubit representation and entanglement', ' Although these\nproperties perform better however pre-mature convergence is the main issue in Quantum\nEvolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-\ntum population', ' In this paper, we introduced a new way to update the quantum population\nof QEA to avoid premature convergence',]
    str = ' '.join(input_text)

    print(fix_newlines(str))

lines=tokenize.sent\u tokenize(txt\u str)

s=行。替换('\n','')

印刷品

(?<=\S)(?<!\bAbstract)\n(?=\S)

你可以试试这个。看演示

https://regex101.com/r/crj3aD/1

Python脚本:

inp = "Balanced Quantum Classical Evolutionary Algorithm(BQCEA)\n\nMuhammad Shahid, Hasan Mujtaba, Muhammad Asim, Omer Beg\n\nAbstract\nWith advancement in Quantum computing, classical algorithms are adapted and integrated\nwith Quantum properties such as qubit representation and entanglement', ' Although these\nproperties perform better however pre-mature convergence is the main issue in Quantum\nEvolutionary Algorithms(QEA) because QEA uses only the best individual to update quan-\ntum population', ' In this paper, we introduced a new way to update the quantum population\nof QEA to avoid premature convergence"

output = re.sub(r'(?<=\S)(?<!\bAbstract)\n(?=\S)', ' ', inp)
print(output)

还有更多的条件

(?<=\S)(?<!\bAbstract)(?:\n|\\n)(?=\S)

在你的另一种情况下试试这个

https://regex101.com/r/crj3aD/2

相关问题 更多 >

    热门问题