使用regex尝试从电子邮件中提取段落

<0.30.1.92.13.39.38.marian+@MARIAN.ADM.CS.CMU.EDU (Marian D'Amico).0> Type: cmu.cs.scs Topic: LOGIC COLLOQUIUM Dates: 6-Feb-92 Time: 3:30 Host: Stephen D. Brookes PostedBy: marian+ on 30-Jan-92 at 13:39 from MARIAN.ADM.CS.CMU.EDU (Marian D'Amico) Abstract: *********************************************************************** Logic Colloquium Thursday February 6 3:30 Wean 5409 ********************************************************************** On The Mathematics of Non-monotonic Reasoning Menachem Magidor Hebrew University of Jerusalem (Joint work with Daniel Lehman) Non-monotonic reasoning is an attempt to develop reasoning systems where an inference means that the conclusion holds in the "normal case", in "most cases", but it does not necessarily hold in all cases. It seems that this type of reasoning is needed if one wants to model everyday common-sense reasoning. There have been many models suggested for non-monotonic reasoning (like circumscription, default logic, autoepistemic logic, etc). We study all these approaches in a more abstract fashion by considering the inference relation of the reasoning system, and clarify the role of different inference rules and the impact they have on the model theory of the logic. We are especially interested in a particular rule called "Rational Monotony" and the connection between it and probabilistic models. NOTE: Prof. Magidor will also give a Math Department Colloquium on Friday February 7. ------------------------- Host: Stephen D. Brookes Appointments can be made through Marian D'Amico, marian@cs, x7665.

1条回答

网友

1楼 · 发布于 2024-09-30 06:30:01

我会尝试另一种方法。你知道吗

可以基于新行拆分文本：

texts = text.split('\n')

在此基础上，开发一个测试，以确定文本是电子邮件正文的一部分还是其他内容。也许可以在前导行/后行空白的地方查找文本块。像这样的方法可能有用：

段落=[]

for i, text in enumerate(texts):
  if i>0:
    if (text != '' and texts[i-1] == '' and texts[i+1]):
       paragraphs.append(text)

顺便说一句，使用regexp只能做到这一点。大多数文本数据源的格式通常有很多变化，您的正则表达式将永远无法捕获每种边缘情况。我必须这样做一次，而且建立一个分类模型来识别段落会更健壮（也更容易）。你知道吗

这是它自己的研究项目，但如果你这样做，看看配对术语频率-逆文档频率（TF-IDF）与支持向量分类器（SVC），不要让任何人说服你使用神经网络，除非你有很多好的训练数据：）。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章