在python中保留长字符串中的子字符串？

612407518| Streptomyces sp. MJ635-86F5 DNA, cremimycin biosynthetic gene cluster, complete sequence 84617315| Streptomyces achromogenes subsp. rubradiris complete rubradirin biosynthetic gene cluster, strain NRRL 3061 345134845| Streptomyces sp. SN-593 DNA, reveromycin biosynthetic gene cluster, complete sequence 323700993| Streptomyces autulyticus strain CGMCC 0516 geldanamycin polyketide biosynthetic gene cluster, complete sequence 15823967| Streptomyces avermitilis oligomycin biosynthetic gene cluster 1408941746| Streptomyces sp. strain OUC6819 rdm biosynthetic gene cluster, complete sequence 315937014| Uncultured organism CA37 glycopeptide biosynthetic gene cluster, complete sequence 29122977| Streptomyces cinnamonensis polyether antibiotic monensin biosynthetic gene cluster, partial sequence 257129259| Moorea producens 19L curacin A biosynthetic gene cluster, partial sequence 166159347| Streptomyces sahachiroi azinomycin B biosynthetic gene cluster, partial sequence

with open("test.txt") as f: for line in f: (id, name) = line.strip().split('|') term_list = name.split() term_index = term_list.index('biosynthetic') term = term_list[int(term_index)-1] header = id + '|' + term print(header)

3条回答

网友

1楼 · 编辑于 2024-10-02 22:38:00

回答不使用正则表达式。如果头不是指定的格式（即总是有“生物合成基因簇”，总是有|取消id，总是在所需单词前有空格），则抛出ValueError。你知道吗

id = header[:header.index("|")+1] 
end = header.index(" biosynthetic gene cluster")
word = header[header[:end].rindex(" ")+1:end]
new_title = id + word

网友

2楼 · 编辑于 2024-10-02 22:38:00

比regex更简单的解决方案是：

拆分“|”上的字符串，将这两个组件分配给变量id和s。你知道吗
将s拆分为单词。你知道吗
在结果列表中找到“生物合成”的位置。你知道吗
确认后面是“gene”和“clusters”。你知道吗
打印id，后跟“生物合成”前面的单词。你知道吗

我故意不写代码。如果你尝试并将你的尝试编辑成问题，其他人可能会回答告诉你如何让它工作（假设你自己做不到）。你知道吗

祝你好运！你知道吗

网友

3楼 · 编辑于 2024-10-02 22:38:00

您可以使用Python的str.split()方法获取数字，直到管道分隔符。你知道吗

为了抓住某个字符串后面的单词，您可能需要使用negative lookahead。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章