正则表达式在循环过程中花费时间太长

for i in range(len(holdList)): foundTerm = re.findall(r"\b" + self._searchTerm + r"\b", holdList[i][5], flags=re.IGNORECASE) # count the occurrence storyLen = len(foundTerm) holdList[i] += (storyLen,) if foundTerm: # Stores each found word as a list of strings # etc holdList[i] += (self.sentences_to_quote(holdList[i][5]), )

2条回答

网友

1楼 · 编辑于 2024-09-30 14:21:53

在re.sub公司用于替换与正则表达式匹配的字符串。这里的任务只是查找匹配项是否存在，因此使用搜索会给予你的表现很好，搜索给你第一场比赛。在

网友

2楼 · 编辑于 2024-09-30 14:21:53

我不确定您的self._searchTerm是否由短语或单词组成，但一般来说，使用sets和dicts而不是regex会得到更好的结果。在这种情况下，您不需要regex机制，因为您只需要计数/匹配完整的单词。例如，要搜索句子中的某个单词，可以用以下方法轻松替换：

search_sentence = set(sent_tokenize.tokenize(...))
if self._search_term in search_sentence:
    # yay

（我使您的代码PEP8兼容。）

如果您担心大写，请将所有内容转换为小写：

^{pr2}$

您还可以使用collection.Counter或collection.defaultdict(int)来计算单词的出现次数。在

如果您必须使用正则表达式，因为您希望匹配遵循特定模式的单词，而不是匹配整个单词，那么我建议您编译该模式一次，然后将该模式传递给其他方法，例如

self.search_pattern = re.compile(r"\b{term}\b".format(term=self._search_term), re.I)
found_term = self.search_pattern.find_all(hold_list[i][5])

相关问题更多 >

编程相关推荐

热门问题

热门文章