如何删除以小写字母开头的句子?

2024-10-16 22:24:26 发布

您现在位置:Python中文网/ 问答频道 /正文

在下面的示例中,下面的regex(“.*?”)用于先删除所有对话。 下一步是删除所有剩余的以小写字母开头的句子。 只应保留以大写字母开头的句子。在

示例:

exclaimed Wade. Indeed, below them were villages, of crude huts made of timber and stone and mud. Rubble work walls, for they needed little shelter here, and the people were but savages.

asked Arcot, his voice a bit unsteady with suppressed excitement.

replied Morey without turning from his station at the window. Below them now, less than half a mile down on the patchwork of the Nile valley, men were standing, staring up, collecting in little groups, gesticulating toward the strange thing that had materialized in the air above them.

在上述示例中,仅应删除以下内容:

exclaimed Wade.
asked Arcot, his voice a bit unsteady with suppressed excitement.
replied Morey without turning from his station at the window.

一个有用的regex或简单的Perl或python代码是受欢迎的。我用的是文本管道的第7版。在

谢谢。在


Tags: andofthe示例regex句子askedvoice
3条回答

这对于您发布的示例应该有效:

text = re.sub(r'(^|(?<=[.!?])\s+)[a-z].*?[.!?](?=\s|$)', r'\1', text)

为什么不使用Lingua::EN::Sentence这样的模块?它使我们很容易从任意的英语文本中获得相当好的句子。在

#!perl

use strict;
use warnings;

use Lingua::EN::Sentence qw( get_sentences );

my $text = <<END;

exclaimed Wade. Indeed, below them were villages, of crude huts made of timber and stone and mud. Rubble work walls, for they needed little shelter here, and the people were but savages.

asked Arcot, his voice a bit unsteady with suppressed excitement.

replied Morey without turning from his station at the window. Below them now, less than half a mile down on the patchwork of the Nile valley, men were standing, staring up, collecting in little groups, gesticulating toward the strange thing that had materialized in the air above them.
END


my $sentences = matching_sentences( qr/^[^a-z]/, $text );

print map "$_\n", @$sentences;

sub matching_sentences {
    my $re   = shift;
    my $text = shift;

    my $s = get_sentences( $text );

    @$s = grep /$re/, @$s;

    return $s;
}

结果:

^{pr2}$

在您的示例中,Perl中的这一点很有用:

$s = "exclaimed Wade. Indeed, ...";

do {
  $prev = $s;
  $s =~ s/(^\s*|[.!?]\s+)[a-z][^.!?]*[.!?]\s*/$1/gs;
} until ($s eq $prev);

如果没有do循环,则无法删除多个连续的句子。在

请注意,完美地完成这项工作几乎是AI-complete。 请看这个问题,看看你永远不会正确回答的句子: LaTeX sometimes puts too much or too little space after periods。在

当然,你可以用LaTeX的启发式方法来判断句子的结束句点,并且大多数时候都是正确的。在

相关问题 更多 >