这个功能可以优化速度吗？

def makeGsList(sentences,org): gs_list1=[] gs_list2=[] for s in sentences: if s.startswith(tuple(StartWords)): s = s.lower() if org=='m': gs_list1 = [k for k in m_words if k in s] if org=='h': gs_list1 = [k for k in h_words if k in s] for gs_element in gs_list1: gs_list2.append(gs_element) gs_list3 = list(set(gs_list2)) return gs_list3

StartWords = ['!Series_title','!Series_summary','!Series_overall_design','!Sample_title','!Sample_source_name_ch1','!Sample_characteristics_ch1'] sentences = [u'!Series_title\t"Transcript profiles of DCs of PLOSL patients show abnormalities in pathways of actin bundling and immune response"\n', u'!Series_summary\t"This study was aimed to identify pathways associated with loss-of-function of the DAP12/TREM2 receptor complex and thus gain insight into pathogenesis of PLOSL (polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy). Transcript profiles of PLOSL patients\' DCs showed differential expression of genes involved in actin bundling and immune response, but also for the stability of myelin and bone remodeling."\n', u'!Series_summary\t"Keywords: PLOSL patient samples vs. control samples"\n', u'!Series_overall_design\t"Transcript profiles of in vitro differentiated DCs of three controls and five PLOSL patients were analyzed."\n', u'!Series_type\t"Expression profiling by array"\n', u'!Sample_title\t"potilas_DC_A"\t"potilas_DC_B"\t"potilas_DC_C"\t"kontrolli_DC_A"\t"kontrolli_DC_C"\t"kontrolli_DC_D"\t"potilas_DC_E"\t"potilas_DC_D"\n', u'!Sample_characteristics_ch1\t"in vitro differentiated DCs"\t"in vitro differentiated DCs"\t"in vitro differentiated DCs"\t"in vitro differentiated DCs"\t"in vitro differentiated DCs"\t"in vitro differentiated DCs"\t"in vitro differentiated DCs"\t"in vitro differentiated DCs"\n', u'!Sample_description\t"DAP12mut"\t"DAP12mut"\t"DAP12mut"\t"control"\t"control"\t"control"\t"TREM2mut"\t"TREM2mut"\n'] h_words = ['pp1665', 'glycerophosphodiester phosphodiesterase domain containing 5', 'gde2', 'PLOSL patients', 'actin bundling', 'glycerophosphodiester phosphodiesterase 2', 'glycerophosphodiester phosphodiesterase domain-containing protein 5']

3条回答

网友

1楼 · 编辑于 2024-10-04 11:32:16

不要对m_words和k_words使用全局变量。你知道吗
将if语句置于for循环之外。你知道吗
一劳永逸地施放tuple(StartWords)。你知道吗
使用程序创建的正则表达式而不是列表。你知道吗
预先编译所有你能做的。你知道吗
直接扩展列表，而不是通过它迭代到append()每个元素。你知道吗
从一开始就使用set而不是list。你知道吗
使用集合理解而不是显式的for循环。你知道吗

m_reg = re.compile("|".join(re.escape(w) for w in m_words))
h_reg = re.compile("|".join(re.escape(w) for w in h_words))

def make_gs_list(sentences, start_words, m_reg, h_reg, org):
    if org == 'm':
        reg = m_reg
    elif org == 'h':
        reg = h_reg

    matched = {w for s in sentences if s.startswith(start_words)
                 for w in reg.findall(s.lower())}

    return matched

网友

2楼 · 编辑于 2024-10-04 11:32:16

我想试试这个

# optionaly change these regexes
FIRST_WORD_RE = re.compile(r"^[a-zA-Z]+")
LOWER_WORD_RE = re.compile(r"[a-z]+")
m_or_h_words = {'m': set(m_words), 'h': set(h_words)}
startwords_set = set(StartWords)

def makeGsList(sentences, org):
    words = m_or_h_words[org]
    gs_set2 = set()
    for s in sentences:
        mo = FIRST_WORD_RE.match(s)
        if mo and mo.group(0) in startwords_set:
            gs_set2 |= set(LOWER_WORD_RE.findall(s.lower())) & words
    return list(gs_set2)

网友

3楼 · 编辑于 2024-10-04 11:32:16

我想你可以把你的句子符号化，来第一次破解这个问题

所以你要做：

在这里使用正则表达式而不是split，但仅用于说明使用split

句子=元组（s.split（''）表示句子中的s）然后，不要使用startswith，而是把你的StartsWords放在一个集合中

所以呢 sw\u set={w代表StartsWords中的w}

然后当你重复你的句子时，做：如果开关设置中的s[0]： #继续你的逻辑

我认为这是你在表演上受到最大冲击的地方。你知道吗

在这里使用正则表达式而不是split，但仅用于说明使用split

相关问题更多 >

编程相关推荐

热门问题

热门文章