<p>如果我理解正确,您想做的是连接<code>sequences</code>的不同元素,其中连接是字符串的开头与另一个字符串的结尾匹配。你知道吗</p>
<p>使用<code>dict</code>的一种方法是使用以下函数<code>match_head_tail()</code>:</p>
<pre><code>def match_head_tail(items, length=3):
result = {}
for x in items:
v = [y for y in items if y[:length] == x[-length:]]
if v:
result[x] = v
return result
</code></pre>
<pre><code>sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']
print(match_head_tail(sequences))
# {'AAGTAAA': ['AAATGAT', 'AAAGTTT'], 'AAAGTTT': ['TTTTCCC'], 'AATTCGC': ['CGCTCCC']}
</code></pre>
<p>如果还想包含不匹配的序列,可以使用以下函数<code>match_head_tail_all()</code>:</p>
<pre><code>def match_head_tail_all( items, length=3):
return {x: [y for y in items if y[:length] == x[-length:]] for x in items}
</code></pre>
<pre><code>sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']
print(match_head_tail_all(sequences))
# {'AAGTAAA': ['AAATGAT', 'AAAGTTT'], 'AAATGAT': [], 'AAAGTTT': ['TTTTCCC'], 'TTTTCCC': [], 'AATTCGC': ['CGCTCCC'], 'CGCTCCC': []}
</code></pre>
<hr/>
<h2>编辑1</h2>
<p>如果您真的需要索引,请将以上内容与<code>enumerate()</code>结合起来得到它们,例如:</p>
<pre><code>def match_head_tail_all_indexes( items, length=3):
return {
i: [j for j, y in enumerate(items) if y[:length] == x[-length:]]
for i, x in enumerate(items)}
sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']
print(match_head_tail_all_indexes(sequences))
# {0: [1, 2], 1: [], 2: [3], 3: [], 4: [5], 5: []}
</code></pre>
<hr/>
<h2>编辑2</h2>
<p>如果您的输入包含许多具有相同结尾的序列,您可能需要考虑实现一些缓存机制以提高计算效率(以牺牲内存效率为代价),例如:</p>
<pre><code>def match_head_tail_cached(items, length=3, caching=True):
result = {}
if caching:
cached = {}
for x in items:
if caching and x[-length:] in cached:
v = cached[x[-length:]]
else:
v = [y for y in items if y[:length] == x[-length:]]
if v:
result[x] = v
return result
sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']
print(match_head_tail_cached(sequences))
# {'AAGTAAA': ['AAATGAT', 'AAAGTTT'], 'AAAGTTT': ['TTTTCCC'], 'AATTCGC': ['CGCTCCC']}
</code></pre>
<hr/>
<h2>编辑3</h2>
<p>所有这些也只能通过<code>list</code>实现,例如:</p>
<pre><code>def match_head_tail_list(items, length=3):
result = []
for x in items:
v = [y for y in items if y[:length] == x[-length:]]
if v:
result.append([x, v])
return result
sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']
print(match_head_tail_list(sequences))
# [['AAGTAAA', ['AAATGAT', 'AAAGTTT']], ['AAAGTTT', ['TTTTCCC']], ['AATTCGC', ['CGCTCCC']]]
</code></pre>
<p>甚至更少的筑巢:</p>
<pre><code>def match_head_tail_flat(items, length=3):
result = []
for x in items:
for y in items:
if y[:length] == x[-length:]:
result.append([x, y])
return result
sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']
print(match_head_tail_flat(sequences))
# [['AAGTAAA', 'AAATGAT'], ['AAGTAAA', 'AAAGTTT'], ['AAAGTTT', 'TTTTCCC'], ['AATTCGC', 'CGCTCCC']]
</code></pre>