在python3中连接列表中的句子

2024-10-02 22:37:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将一系列附加的句子连接到一个大的字符串文本对象中,以便可以将其用作Gensim summary模块的输入。然而,当我试着这么做时,它说输入的句子少于2个。但当我对文本进行拆分时,我会看到多个句子,但它只计算每个句子一次,而不是计算句子的总数。变量r是一个字符串类型的对象。我想把这些句子连接成一个大字符串,这样就可以通过Gensim summary模块来阅读它。在

示例代码:

import re
ruling_corpora  = re.findall("\.?([^\.].\*?I find[^\.]*\. |[^\.]*$In sum[^\.]*\. |[^\.]*$agree[^\.]*\.)", tokenized, re.I |re.DOTALL |re.M)[1:-1]

for r in ruling_corpora:                                   
    print(type(r))
    rc= ''.join(r)
    print(summarize(rc))

样本输出:

^{pr2}$

这里是我输入的一个例子,我想用Gensim摘要器进行总结。每个字符串下面的数字表示以句点结尾的句子数:

####Beginning of File### LUMB65.BL23607963.xml
Background Content: ANDERSON INITIAL DECISIONOn January 13, 2015, the appellant filed this appeal arguing that the agency's decision not to renew his term limited appointment which expired on January 28, 2015, is in error.  

 For the reasons discussed below, this appeal is DISMISSED for lack of jurisdiction without a hearing.
1
There is nothing in the agreement that curtails the agency's ability not to extend the term appointment. 
 IdIn reviewing the appellant's arguments, the appellant fails to establish that the Board has jurisdiction to review the agency's decision not to renew his time-limited appointment at issue in this appeal.
 Following a review of the record evidence, I find that the appellant has failed to non-frivolously allege Board jurisdiction over this appeal on any basis.
 Accordingly, this appeal must be dismissed for lack of jurisdiction.
1
####End of File### LUMB65.BL23607963.xml

Tags: oftheto字符串inreforthat
1条回答
网友
1楼 · 发布于 2024-10-02 22:37:29

根据the documentation(重点我):

The input should be a string, and must be longer than INPUT_MIN_LENGTH sentences for the summary to make sense. The text will be split into sentences using the split_sentences method in the gensim.summarization.texcleaner module. Note that newlines divide sentences.

尝试使用rc = '\n'.join(r)。您也可以通过调用gensim.summarization.texcleaner.split_sentences来检查结果来进行调试。在

另外,正则表达式与给定的输入不匹配,即使匹配,也要用[1:-1]丢弃仅有的两个结果。试试这个:

>>> map(lambda x: x[0], re.findall('([^.]*?(I find|In sum|agree)[^.]*\.)', tokenized, re.I | re.DOTALL | re.M))
["\n1\nThere is nothing in the agreement that curtails the agency's ability not to extend the term appointment.", '\n Following a review of the record evidence, I find that the appellant has failed to non-frivolously allege Board jurisdiction over this appeal on any basis.']

您可能需要先处理独立的数字,因为它们会出现在匹配中。在

相关问题 更多 >