我使用regexpython模块从法律文档中查找所有关键短语。其中一个在5 U.S.C.§8452(a)中,但它只打印和查找句子,并在第一个句点处停止;因此,我的输出不是:
The Board has jurisdiction over this appeal under 5 U.S.C. § 8452(a)
,上面写着
The Board has jurisdiction over this appeal under 5 U.
相反。这是我的密码
ruling_corpora = map(lambda x: x[0], re.findall('([^.]*?(I find|In sum|agree|affirm|disagree|I conclude|In light of| under| this appeal| The ALJ| I determine| we| based on| for the reasons| pursuant to| the decision is| jurisidiction|section|§+\d |conclude)[^.]*\.)', tokenized, re.I | re.DOTALL | re.M))
reduce = 0
for r in ruling_corpora:#*
reduce -=5
big_list=[]
big_list.extend(ruling_corpora)
rc_list=[]
rc_list.append(set(r))
big_string= "".join(str(x)for x in big_list)
if len(big_string.split('.'))<= 3:
while len(big_string.split())<=200:
print("Ruling Content: {} \n".format(big_string))
break
break
else:
summary=summarize(big_string,word_count=250+reduce)
print("Summarized Ruling: {}\n".format(summary))
break
break
正则表达式在第一个文字点处停止。你知道吗
标记的(^^^^)部分捕获所有不是点+文字点的文本,然后就完成了。你知道吗
那就是
The Board has jurisdiction over this appeal under 5 U.
如果不显示真实文本,可以更改此特殊情况以捕获任何不是
)
的内容,然后再捕获)
:相关问题 更多 >
编程相关推荐