<p>这是一个相当有效的方法,虽然它需要O(N)辅助空间,但如果运行次数很小,则不应该很重要:</p>
<pre><code>from itertools import groupby
def ngrams(seq):
stop = len(seq)+1
for n in range(2, stop):
for i in range(stop - n):
yield seq[i:i+n]
def get_combos(seq):
runs = []
for _, g in groupby(enumerate(seq), lambda x:x[1]-x[0]):
run = [a for _, a in g]
for x in run:
yield [x]
if len(run) > 1:
runs.append(run)
for run in reversed(runs):
yield from ngrams(run)
</code></pre>
<p>注意,这使用<a href="https://stackoverflow.com/a/2154437/5014455">this classic approach</a>对连续整数进行分组。它在连续整数组上迭代,“runs”,并生成作为单个元素列表的任何单个整数。如果运行长度超过1,我会将其添加到运行列表中。最后,以相反的方式迭代运行列表,得到“n-grams”,从order2到orderlen(run)。你知道吗</p>
<p>行动中:</p>
<pre><code>>>> A = [0,2,5,6]
>>> B = [5,6,8,9]
>>> C = [6,7,8,9]
>>> list(get_combos(A))
[[0], [2], [5], [6], [5, 6]]
>>> list(get_combos(B))
[[5], [6], [8], [9], [8, 9], [5, 6]]
>>> list(get_combos(C))
[[6], [7], [8], [9], [6, 7], [7, 8], [8, 9], [6, 7, 8], [7, 8, 9], [6, 7, 8, 9]]
</code></pre>
<p/><h3>注意<code>get_combos</code>假设输入是排序的。
<p/><h2>编辑</h2>但是,对于:
<pre><code>>>> D = [6,7,9,12,13,14,20,21,30]
</code></pre>
<p>这将产生:</p>
<pre><code>>>> list(get_combos(D))
[[6], [7], [9], [12], [13], [14], [20], [21], [30], [20, 21], [12, 13], [13, 14], [12, 13, 14], [6, 7]]
</code></pre>
<p>也就是说,3序列在产生后续运行的2序列之前开始。如果要在n+1 len序列之前生成所有n-len序列,请使用以下方法:</p>
<pre><code>from itertools import groupby
def ngrams(seq, max_len):
curr = seq
for n in range(1, max_len + 1):
nxt = []
for run in curr:
run_len = len(run)
if run_len > n:
nxt.append(run)
for i in range(run_len + 1 - n):
yield run[i:i+n]
curr = nxt
def _sub_index(t):
return t[1] - t[0]
def get_consecutive_runs(seq):
grouped = groupby(enumerate(seq), _sub_index)
for _, g in grouped:
yield [a for _, a in g]
def get_combos(seq):
runs = list(get_consecutive_runs(seq))
max_len = max(map(len, runs))
yield from ngrams(runs, max_len)
</code></pre>
<p>结果如下:</p>
<pre><code>>>> list(get_combos(D))
[[6], [7], [9], [12], [13], [14], [20], [21], [30], [6, 7], [12, 13], [13, 14], [20, 21], [12, 13, 14]]
</code></pre>