字符串Python中多字子串的性能匹配

2024-10-03 21:25:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做一个项目,还没有找到任何有用的资源来帮助我如何将多个单词的子字符串与字符串匹配

例如: substring = "I can be found in this string" 在里面 string = "Now, I can be found in this string example"

我不能使用.find()方法或正则表达式,为了使事情更复杂,边缘情况包括:

"reflexion mirror""'reflexion mirror'"不匹配,但与"(reflexion mirror)"匹配

"maley""o'maley"不匹配

"luminate"匹配"'''luminate"

"luminate"匹配"luminate__"

"george"georges"不匹配

每当字符以字符串结尾时,如"__hello world__""''hello world''",它不会干扰匹配"hello""world"

我用Boyer Moore来寻找子串,除了这些看似冲突的边缘情况。哦,是的,我还忘了提到这个解决方案应该强调时间复杂性方面的性能

我使用word.translate({ord(c): None for c in string.whitespace}).lower()对字符串和子字符串进行预处理,结果如下:

"asuggestionboxentryfrombobcarterdearanonymous,i'mnotquitesureiunderstandtheconceptofthis'anonymous'suggestionbox.ifnoonereadswhatwewrite,thenhowwillanythingeverchangebutinthespiritofgoodwill,i'vedecidedtooffermytwocents,andhopefullykevinwon'tstealit(ha,ha).iwouldreallyliketoseemorevarietiesofcoffeeinthecoffeemachineinthebreakroom.'milkandsugar','blackwithsugar','extrasugar'and'creamandsugar'don'toffermuchdiversity.also,theselectionofdrinksseemsheavilyweightedinfavorof'sugar'.whatifwedon'twantanysugar?"

关于如何解释这些边缘情况,有什么想法吗

谢谢

编辑

需要注意的是'将被视为一个字符

下面是我收集边缘案例的单元测试:

class TestCountoccurrencesInText(unittest.TestCase):
    def test_count_occurrences_in_text(self):
        """
        Test the count_occurrences_in_text function
        """
        text = """Georges is my name and I like python. Oh ! your name is georges? And you like Python!
Yes is is true, I like PYTHON
and my name is GEORGES"""
        # test with a little text.
        self.assertEqual(3, count_occurrences_in_text("Georges", text))
        self.assertEqual(3, count_occurrences_in_text("GEORGES", text))
        self.assertEqual(3, count_occurrences_in_text("georges", text))
        self.assertEqual(0, count_occurrences_in_text("george", text))
        self.assertEqual(3, count_occurrences_in_text("python", text))
        self.assertEqual(3, count_occurrences_in_text("PYTHON", text))
        self.assertEqual(2, count_occurrences_in_text("I", text))
        self.assertEqual(0, count_occurrences_in_text("n", text))
        self.assertEqual(0, count_occurrences_in_text("reflexion mirror", "I am a senior citizen and I live in the Fun-Plex 'Reflexion Mirror' in Sopchoppy, Florida"))
        self.assertEqual(1, count_occurrences_in_text("Linguist", "'''Linguist Specialist Found Dead on Laboratory Floor'''"))


Tags: and字符串textinselfstringismirror