不是其他字符串的子字符串的返回字符串是否可以在小于O（n^2）的时间内返回？

3条回答

网友

1楼 · 编辑于 2024-06-26 17:44:52

记忆有问题吗？你可以求助于久经考验的…崔！你知道吗

建立后缀树！你知道吗

根据你的意见['abc','abcd','ab','def','efgd']

我们会有一棵树

              _
            / | \
           a  e  d
          /   |   \
         b*   f    e
        /     |     \
       c*     g      f*
      /       |
     d*       d*

利用对所述树的DFS（深度优先搜索）搜索，您将定位最深的叶abcd、efgd和def

树遍历是非常直接的，并且您的时间复杂度是O(n*m).，这比您之前的O(n^2)时间有了很大的改进。你知道吗

使用这种方法，添加新的键变得很简单，但仍然很容易找到唯一的键。你知道吗

考虑添加键deg

你的新树大概

              _
            / | \
           a  e  d
          /   |   \
         b*   f    e
        /     |   / \
       c*     g  g*   f*
      /       |
     d*       d*

对于这个新的树，执行DFS搜索以获得不是其他树前缀的唯一键仍然是一个简单的问题。你知道吗

from typing import List


class Trie(object):
    class Leaf(object):
        def __init__(self, data, is_key):
            self.data = data
            self.is_key = is_key
            self.children = []

        def __str__(self):
            return "{}{}".format(self.data, "*" if self.is_key else "")

    def __init__(self, keys):
        self.root = Trie.Leaf('', False)
        for key in keys:
            self.add_key(key)

    def add_key(self, key):
        self._add(key, self.root.children)

    def has_suffix(self, suffix):
        leaf = self._find(suffix, self.root.children)

        if not leaf:
            return False

        # This is only a suffix if the returned leaf has children and itself is not a key
        if not leaf.is_key and leaf.children:
            return True

        return False

    def includes_key(self, key):
        leaf = self._find(key, self.root.children)

        if not leaf:
            return False

        return leaf.is_key

    def delete(self, key):
        """
        If the key is present as a unique key as in it does not have any children nor are any of its nodes comprised of
         we should delete all of the nodes up to the root
        If the key is a prefix of another long key in the trie, umark the leaf node
        if the key is present in the trie and contains no children but contains nodes that are keys we should delete all
         nodes up to the first encountered key
        :param key:
        :return:
        """

        if not key:
            raise KeyError

        self._delete(key, self.root.children, None)

    def _delete(self, key, children: List[Leaf], parents: (List[Leaf], None), key_idx=0, parent_key=False):
        if not parents:
            parents = [self.root]

        if key_idx >= len(key):
            return

        key_end = True if len(key) == key_idx + 1 else False
        suffix = key[key_idx]
        for leaf in children:
            if leaf.data == suffix:
                # we have encountered a leaf node that is a key we can't delete these
                # this means our key shares a common branch
                if leaf.is_key:
                    parent_key = True

                if key_end and leaf.children:
                    # We've encountered another key along the way
                    if parent_key:
                        leaf.is_key = False
                    else:
                        # delete all nodes recursively up to the top of the first node that has multiple children
                        self._clean_parents(key, key_idx, parents)
                elif key_end and not leaf.children:
                    # delete all nodes recursively up to the top of the first node that has multiple children
                    self._clean_parents(key, key_idx, parents)

                # Not at the key end so we need to keep traversing the tree down
                parents.append(leaf)
                self._delete(key, leaf.children, parents, key_idx + 1, key_end)

    def _clean_parents(self, key, key_idx, parents):
        stop = False
        while parents and not stop:
            p = parents.pop()

            # Need to stop processing a removal at a branch
            if len(p.children) > 1:
                stop = True

            # Locate our branch and kill its children
            for i in range(len(p.children)):
                if p.children[i].data == key[key_idx]:
                    p.children.pop(i)
                    break
            key_idx -= 1

    def _find(self, key, children: List[Leaf]):
        if not key:
            raise KeyError

        match = False
        if len(key) == 1:
            match = True

        suffix = key[0]
        for leaf in children:
            if leaf.data == suffix and not match:
                return self._find(key[1:], leaf.children)
            elif leaf.data == suffix and match:
                return leaf
        return None

    def _add(self, key, children: List[Leaf]):
        if not key:
            return

        is_key = False
        if len(key) == 1:
            is_key = True

        suffix = key[0]
        for leaf in children:
            if leaf.data == suffix:
                self._add(key[1:], leaf.children)
                break
        else:
            children.append(Trie.Leaf(suffix, is_key))
            self._add(key[1:], children[-1].children)

        return

    @staticmethod
    def _has_children(leaf):
        return bool(leaf.children)


def main():
    keys = ['ba', 'bag', 'a', 'abc', 'abcd', 'abd', 'xyz']
    trie = Trie(keys)
    print(trie.includes_key('ba'))  # True
    print(trie.includes_key('b'))  # False
    print(trie.includes_key('dog'))  # False
    print(trie.has_suffix('b'))  # True
    print(trie.has_suffix('ab'))  # True
    print(trie.has_suffix('abd'))   # False

    trie.delete('abd')  # Should only remove the d
    trie.delete('a')    # should unmark a as a key
    trie.delete('ba')   # should remove the ba trie
    trie.delete('xyz')  # Should remove the entire branch
    trie.delete('bag')  # should only remove the g

    print(trie)

if __name__ == "__main__":
    main()

请注意，上面的trie实现并没有实现DFS搜索；但是，它为您提供了一些令人惊奇的开始工作。你知道吗

网友

2楼 · 编辑于 2024-06-26 17:44:52

弹出第一个元素。遍历剩余的每个元素，看看较短的字符串是否是较长字符串的子字符串。重复。应该是O（n logn）

编辑：实施初稿

def not_substrings(l):
    mask = [True]*len(l)
    for i in range(len(l)):
        if not mask[i]:
            continue
        for j in range(i+1, len(l)):
            if len(l[i]) > len(l[j]):
                if l[j] in l[i]:
                    mask[j] = False
            elif l[j] == l[i]:
                mask[j] = False
                mask[i] = False
            else:
                if l[i] in l[j]:
                    mask[i] = False
        if mask[i]:
            print l[i]

我没有运行此代码，但应该大致正确。我不知道是否有一种不带掩码的方法，或者[True]*len(l)语句的时间复杂度是多少。我没有做过任何严格的分析，但在我看来这是n log n，因为每次迭代只迭代列表的剩余部分，而不是整个列表。你知道吗

网友

3楼 · 编辑于 2024-06-26 17:44:52

使用Aho-Corasick应该允许您获得O(n)的渐进运行时间，代价是增加额外的内存使用，以及更高的固定成本乘数（被big-O符号忽略，但仍然有意义）。算法的复杂度是多个组件的总和，但没有一个组件相乘，因此它应该与所有度量（字符串数、字符串长度、最长字符串等）成线性关系。你知道吗

使用^{}，您将执行一个初始过程来生成一个可以同时扫描所有字符串的自动机：

import ahocorasick

# This code assumes no duplicates in mystrings (which would make them mutually
# substrings). Easy to handle if needed, but simpler to avoid for demonstration

mystrings = ['abc','abcd','ab','def','efgd']

# Build Aho-Corasick automaton, involves O(n) (in combined length of mystrings) work
# Allows us to do single pass scans of a string for all strings in mystrings
# at once
aut = ahocorasick.Automaton()
for s in mystrings:
    # mapping string to itself means we're informed directly of which substring
    # we hit as we scan
    aut.add_word(s, s)
aut.make_automaton()

# Initially, assume all strings are non-substrings
nonsubstrings = set(mystrings)

# Scan each of mystrings for substrings from other mystrings
# This only involves a single pass of each s in mystrings thanks to Aho-Corasick,
# so it's only O(n+m) work, where n is again combined length of mystrings, and
# m is the number of substrings found during the search
for s in mystrings:
    for _, substr in aut.iter(s):
        if substr != s:
           nonsubstrings.discard(substr)

# A slightly more optimized version of the above loop, but admittedly less readable:
# from operator import itemgetter
# getsubstr = itemgetter(1)
# for s in mystrings:
#     nonsubstrings.difference_update(filter(s.__ne__, map(getsubstr, aut.iter(s))))

for nonsub in nonsubstrings:
    print(nonsub)

注意：令人恼火的是，我现在在一台没有编译器的机器上，所以我不能安装pyahocorasick来测试这段代码，但我以前用过它，我相信这应该可以工作，模块化愚蠢的输入错误。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章