列表中的双状态机

2024-09-27 22:19:18 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑以下功能:

def search( seq, start, end ):
    state = 0
    ret = []
    aux = []
    for i in seq:
        if state == 0:
            if i == start:
                aux = [i]
                state = 1
        elif state == 1:
            aux.append(i);
            if i == end:
                ret.append(aux)
                state = 0
    return ret

search()函数是一个非常基本的双状态机,它使用startend作为分隔符返回子列表。例如:

DNA = ['CGC','UUC','GCU','UUG','GAA','AAU','UUG','UGU','GUU','UUU','UGU',
       'GGC','UGC','UCG','CUG','CUC','AAA','UUG','UUC','GCU','GCU','UUU',
       'UGU','GUC','CUG','GCU','GCU','UUU','AUU','AUU','AAU','CGC','UGC',
       'UUG','GCG','GUU','CUG','UUA','CGC','UGC','UUG','GGC','UUG','UUG',
       'UGG','CUU','UGG','UUG','UUU','GUA','UAU','UGA','GCU','GUU','CUU',
       'UGG','CUU','UGG','AAU','UUU','GUU','UAU','UAG','GCU','GCU','CUU',
       'GUU','GUU','GUU','GCU','UGU','UGU','AAU','GUU','GGC']


print( search( DNA, start='AAU', end='GUU') )

输出:

[['AAU', 'UUG', 'UGU', 'GUU'], ['AAU', 'CGC', 'UGC', 'UUG', 'GCG', 'GUU'], ['AAU', 'UUU', 'GUU'], ['AAU', 'GUU']]

有没有可能用list comprehension写一个等价的函数?你知道吗


Tags: searchuuustartendstateauxcgcugc
2条回答

我不确定理解是完成这项任务的正确工具。不过,您可以编写一个非常pythonic的generator

def search(seq, start, end):
    ret = []
    for i in seq:
        if i == start or ret:
            ret.append(i)
        if i == end and ret:
            yield ret
            ret = []

>>> list(search(DNA, start='AAU', end='GUU'))
[['AAU', 'UUG', 'UGU', 'GUU'],
 ['AAU', 'CGC', 'UGC', 'UUG', 'GCG', 'GUU'],
 ['AAU', 'UUU', 'GUU'],
 ['AAU', 'GUU']]

如果你真的想要理解,你可以使用一些takewhiledropwhile的调戏:

 from itertools import takewhile as t, dropwhile as d
 it = iter(DNA)
 [x+[end] for x in (list(t(lambda i: i!=end, d(lambda i: i!=start, it))) for x in range(DNA.count(end))) if x]

这是丑陋的,但我相信它有一些问题:)一个发生的start后,最后end例如。。。你知道吗

我想你想要的是列表的子集,给定列表中的起始值和结束值。 首先,您可以通过以下方式减少搜索空间:

start_index = seq.index(start)
end_index = seq.index(end)
seq[start_index:end_index+1]

然后,您可以在空间中迭代搜索更多的开始和结束。由于没有重叠序列,您可以尝试:

def search(seq, start, end):
    while start in seq:
        start_index = seq.index(start)
        end_index = seq.index(end) 
        if end_index > start_index:
            yield seq[start_index:end_index+1]
        seq = seq[end_index+1:]



DNA = ['CGC','UUC','GCU','UUG','GAA','AAU','UUG','UGU','GUU','UUU','UGU',    
       'GGC','UGC','UCG','CUG','CUC','AAA','UUG','UUC','GCU','GCU','UUU',    
       'UGU','GUC','CUG','GCU','GCU','UUU','AUU','AUU','AAU','CGC','UGC',    
       'UUG','GCG','GUU','CUG','UUA','CGC','UGC','UUG','GGC','UUG','UUG',    
       'UGG','CUU','UGG','UUG','UUU','GUA','UAU','UGA','GCU','GUU','CUU',    
       'UGG','CUU','UGG','AAU','UUU','GUU','UAU','UAG','GCU','GCU','CUU',    
       'GUU','GUU','GUU','GCU','UGU','UGU','AAU','GUU','GGC']

start='AAU'
end='GUU'

list(search(DNA, start='AAU', end='GUU'))

或者(尽管在任何zense中都是完全“非音速的”),您可以使用numpy.searchsorted给定开始和结束的索引:

import numpy as np
import pandas as pd

DNA = ['CGC','UUC','GCU','UUG','GAA','AAU','UUG','UGU','GUU','UUU','UGU',    
       'GGC','UGC','UCG','CUG','CUC','AAA','UUG','UUC','GCU','GCU','UUU',    
       'UGU','GUC','CUG','GCU','GCU','UUU','AUU','AUU','AAU','CGC','UGC',    
       'UUG','GCG','GUU','CUG','UUA','CGC','UGC','UUG','GGC','UUG','UUG',    
       'UGG','CUU','UGG','UUG','UUU','GUA','UAU','UGA','GCU','GUU','CUU',    
       'UGG','CUU','UGG','AAU','UUU','GUU','UAU','UAG','GCU','GCU','CUU',    
       'GUU','GUU','GUU','GCU','UGU','UGU','AAU','GUU','GGC']

start='AAU'
end='GUU'

arr = pd.Series(DNA)
start_indices = arr[arr == start].index
end_indices = arr[arr == end].index
for start_idx, end_idx in np.column_stack((start_indices, end_indices[np.searchsorted(end_indices, start_indices, side='right')])):
    print(DNA[start_idx:end_idx+1])

numpy回答来自Given 2 list of integers how to find the non-overlapping ranges?

相关问题 更多 >

    热门问题