如何使用python正则表达式将每个爆炸结果分离，并将其存储在列表中供进一步分析

2条回答

网友

1楼 · 编辑于 2024-09-29 17:51:05

我终于找到了将大文件分成小块的解决方案，这样我就可以使用python正则表达式处理单个查询结果。。。这是我的密码。。。在

#!/user/bin/python3
file=open("/path/file_name.txt","r+")
import re
inter=file.read()
lst=re.findall('(?<=Query= lcl)(.*?)(?=Effective search space)', inter, flags=re.S)
print(lst)

谢谢你们帮我。。。在

网友

2楼 · 编辑于 2024-09-29 17:51:05

要获得所需的结果，请使用re.split()对以下内容调用re.findall()编辑该行：

lst=re.split(r'(>Query\=.*)?',inter,re.DOTALL)

有关re.split()的详细信息，请参阅此部分：

https://docs.python.org/2/library/re.html

另外，您可能需要考虑在biopython中使用现已弃用的BLAST解析器：

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc96

The plain text BLAST parser is located in Bio.Blast.NCBIStandalone.
As with the XML parser, we need to have a handle object that we can pass to the parser. The handle must implement the readline() method and do this properly. The common ways to get such a handle are to either use the provided blastall or blastpgp functions to run the local blast, or to run a local blast via the command line, and then do something like the following:
result_handle = open("my_file_of_blast_output.txt")
现在我们有了一个句柄（我们称之为result\u handle），我们准备好解析它了。这可以通过以下代码完成：

^{pr2}$

This will parse the BLAST report into a Blast Record class (either a Blast or a PSIBlast record, depending on what you are parsing) so that you can extract the information from it. In our case, let’s just print out a quick summary of all of the alignments greater than some threshold value.

>>> E_VALUE_THRESH = 0.04
>>> for alignment in blast_record.alignments: 
...     for hsp in alignment.hsps: 
...         if hsp.expect < E_VALUE_THRESH: 
...             print('****Alignment****') 
...             print('sequence:', alignment.title) 
...             print('length:', alignment.length)
...             print('e value:', hsp.expect) 
...             print(hsp.query[0:75] + '...') 
...             print(hsp.match[0:75] + '...') 
...             print(hsp.sbjct[0:75] + '...')

If you also read the section 7.3 on parsing BLAST XML output, you’ll notice that the above code is identical to what is found in that section. Once you parse something into a record class you can deal with it independent of the format of the original BLAST info you were parsing. Pretty snazzy!

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用python正则表达式将每个爆炸结果分离，并将其存储在列表中供进一步分析

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >