使用Python来减少表格形式的BLAST输出文件中的命中数

2条回答

网友

1楼 · 编辑于 2024-09-30 16:42:31

此一次性版本按标题在文件中出现的顺序打印每个标题的前10个：

import sys
NUM_TO_PRINT=10     # good practice - use names rather than raw numbers

blastfile = open(sys.argv[1],"r")

titles={};  # an empty dictionary.
    # This will map titles to counts of how many times a line with that title
    # has been printed.

for line in blastfile:
    title = line.split()[0];    # assuming the title is space-delimited, and that the line is not empty
    num_printed = titles.get(title, 0);     # 0 is the default
    if num_printed<NUM_TO_PRINT:
        print line,    # comma because _line_ already has a newline - without the comma, you get a blank line after every printed line
        num_printed += 1
        titles[title] = num_printed     # save where we are

网友

2楼 · 编辑于 2024-09-30 16:42:31

这里的问题似乎是您在文件对象中迭代了两次。在Python中，文件对象的工作方式非常类似于遍历每一行的指针。如果不向后移动指针，它就没有可读取的内容。在

您需要做的是使用.seek函数将该指针移回起始位置。例如，假设你有一个file_to_read.txt和你的python_script.py。在

文件_阅读.txt

Hello! My name is Bob and I can't think of anything to
put in this file so I'm blabbering on about nonsense
in hopes that you won't realise that this text is not
important but the code in the actually file, though I
think that you wouldn't mind reading this long file.

Python_脚本.py

^{pr2}$

如果要运行这段代码（并且没有出现有关目录的错误），则只需打印一次file_to_read.txt。要解决这个问题，您只需在阅读之间添加一个f.seek(0, 0)。例如：

f = open("file_to_read.txt", "r")
for line in f: print line
f.seek(0, 0)
for lien in f: print line

现在，回到您的上下文，您可以看到这是如何应用于您的代码的：

import sys
# Here is your reading of file
blastfile = open(sys.argv[1],"r")
column1list = []
# Here is the first time you read the file
for line in blastfile:
    b = line.split()[0]
    column1list.append(b)
# Add a line to move back to the start before the
# next reading
blastfile.seek(0, 0)

uniqcolumn1 = list(set(column1list))

for val in uniqcolumn1:
    # Move the counter inside to refresh it after every iteration
    counter = 0
    # Here is the second time you read your file
    for line in blastfile:
        while counter <= 10:
            if line.startswith(val):
                print line
                counter += 1
    # Since you are going to read the file the next iteration,
    # .seek the file
    blastfile.seek(0, 0)

编辑

这是代码的后半部分，已修复。你也可以这样做：

for val in uniqcolumn1:
    # Move the counter in
    counter = 0
    # Move the while loop out
    while counter <= 10:
        for line in blastfile:
            if line.startswith(val):
                print line,
                counter += 1
    blastfile.seek(0, 0)

这样做的好处是for循环提前终止，它不会读取整个文件。在

另一种方法是：

for val in uniqcolumn1:
    # Move counter in
    counter = 0
    # Remove while statement
    for line in blastfile:
        # Add additional condition to if statement
        if line.startswith(val) and counter <= 10:
            print line,
            counter += 1
        elif counter > 10:
            break
    blastfile.seek(0, 0)

这样做的好处是看起来更简单。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Python来减少表格形式的BLAST输出文件中的命中数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >