在序列中查找某些基的正则表达式

3条回答

网友

1楼 · 编辑于 2024-09-30 12:32:36

print re.sub("[^ACTGNU]","",fastA_string)

你会得到无数个答案

或没有re

^{pr2}$

网友

2楼 · 编辑于 2024-09-30 12:32:36

你需要使用一个字符集。

re.findall(r"[ATGCUN]", self.fastAsequence)

您的代码查找文本"A,T,G,C,U,N"，并输出所有出现的文本。regex中的字符集允许以下类型的搜索：A，T，G，C，U，N”，而不是“下面的：A,T,G,C,U,N”

网友

3楼 · 编辑于 2024-09-30 12:32:36

我会完全避免使用正则表达式。您可以使用str.translate删除不需要的字符。

from string import ascii_letters

removechars = ''.join(set(ascii_letters) - set('ACTGNU'))

newFastA = self.fastAsequence.translate(None, removechars)

演示：

^{pr2}$

如果还想删除空白，可以将string.whitespace放入removechars。

旁注，以上仅适用于Python2，在Python3中还有一个附加步骤：

from string import ascii_letters, punctuation, whitespace

#showing how to remove whitespace and punctuation too in this example
removechars = ''.join(set(ascii_letters + punctuation + whitespace) - set('ACTGNU'))

trans = str.maketrans('', '', removechars)

dna.translate(trans)
Out[11]: 'ACTAGAGAUACCACGGNUGNUGNU'

相关问题更多 >

编程相关推荐

热门问题

热门文章

在序列中查找某些基的正则表达式

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >