使用findall regex表达式方法python

2024-10-02 22:27:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用regexpython模块从法律文档中查找所有关键短语。其中一个在5 U.S.C.§8452(a)中,但它只打印和查找句子,并在第一个句点处停止;因此,我的输出不是:

The Board has jurisdiction over this appeal under 5 U.S.C. § 8452(a)

,上面写着

The Board has jurisdiction over this appeal under 5 U.

相反。这是我的密码

  ruling_corpora  = map(lambda x: x[0], re.findall('([^.]*?(I find|In sum|agree|affirm|disagree|I conclude|In light of| under| this appeal| The ALJ| I determine| we| based on| for the reasons| pursuant to| the decision is| jurisidiction|section|§+\d |conclude)[^.]*\.)', tokenized, re.I | re.DOTALL | re.M))

    reduce = 0
    for r in ruling_corpora:#*
      reduce -=5
      big_list=[]
      big_list.extend(ruling_corpora)
      rc_list=[]
      rc_list.append(set(r))
      big_string= "".join(str(x)for x in  big_list)
      if len(big_string.split('.'))<= 3:
        while len(big_string.split())<=200:
          print("Ruling Content: {} \n".format(big_string))
          break
        break
    else:                                  
      summary=summarize(big_string,word_count=250+reduce)
      print("Summarized Ruling: {}\n".format(summary))
      break
   break

Tags: thereboardreduceforstringthislist
1条回答
网友
1楼 · 发布于 2024-10-02 22:27:08

正则表达式在第一个文字点处停止。你知道吗

([^.]*?( _snipped lots of text_ )[^.]*\.
                                # ^^^^^^

标记的(^^^^)部分捕获所有不是点+文字点的文本,然后就完成了。你知道吗

那就是The Board has jurisdiction over this appeal under 5 U.

如果不显示真实文本,可以更改此特殊情况以捕获任何不是)的内容,然后再捕获)

'([^.]*?(I find|In sum|agree|affirm|disagree|I conclude|In light of| under| this appeal| The ALJ| I determine| we| based on| for the reasons| pursuant to| the decision is| jurisidiction|section|§+\d |conclude)[^)]*\))', tokenized, re.I | re.DOTALL | re.M))

相关问题 更多 >