使用数字列表的Python子集序列

#Function Collects Sequences and Writes to a Files def gen_insertion_seq(index, seq, gene): output = open("%s_insertion_seq.txt" % gene, 'w') indices = index.read() sequence = seq.read() for i in indices: site = sequence[i-9:i+15] output.write(site + '\n') #Open Index Files shaker_index = open("212_index.txt") kir2_index = open("214_index.txt") asic1a_index = open("216_index.txt") nachra7_index = open("252_index.txt") #Open Sequence Files shaker_seq = open("212_seq.txt") kir2_seq = open("214_seq.txt") asic1a_seq = open("216_seq.txt") nachra7_seq = open("252_seq.txt") #Call function on Index and Sequence Files - Should output list of generated Sequences for insertion sites. #Must hand check first couple gen_insertion_seq(shaker_index, shaker_seq, 'shaker')

2条回答

网友

1楼 · 编辑于 2024-05-18 12:33:27

试着用双引号输入'shaker'，"shaker"。或者，在你的功能中使用str（基因）。你知道吗

好吧，我刚意识到这是python，所以我认为引用什么都不重要

或open("{}_insertion_seq.txt".format(gene), 'w')

如果是在写入时，则更改output.write(site + '\n') 到output.write(str(site) + '\n')

网友

2楼 · 编辑于 2024-05-18 12:33:27

代码中的错误是由于^{}并没有像您期望的那样执行。Called without parameters，它将整个文件读入一个字符串。然后迭代字符串中的字符，而不是文件中的数字。当您在序列的索引中执行'1' - 9操作时，^{}就会发生。你知道吗

您将迭代值转换为^{}的直觉基本上是正确的。但是，由于仍在对字符进行迭代，因此会得到int('1')、int('3')、int('1')、int('2')，然后是来自int('\n')的^{}。read按原样读入整个文件、换行符和所有内容。你知道吗

幸运的是，file object是iterable在文件的行上。这意味着您可以执行类似于for line in file: ...的操作，并且line将接受要解析的每个索引的字符串值。它还有一个额外的优点，那就是从行中去掉了行尾，这意味着您可以将它直接传递到int，而无需进一步修改。你知道吗

您可以对代码进行许多其他改进，包括使其正常工作的更正。你知道吗

按照@accumulation的建议，在^{}块中打开文件，以确保在程序崩溃时（例如，由于I/O错误）可以正确地清理这些文件。当块结束时，它也会自动关闭文件，这是您目前根本没有做的事情（但应该做）
从概念上讲，根本不需要传递文件对象。你只能在一个地方用一个。我甚至会对此进行扩展，建议您编写一个小函数，将每个文件类型解析为可用的格式，并将其传递给其他人。
在Python中，文件是可逐行读取的。这对于索引文件尤其方便，因为索引文件是非常面向行的格式。您根本不需要执行完整的read，并且可以从@maximitartanko的评论中节省几个步骤。
您可以直接在文件上使用^{}来组合任何有换行符的序列。

结合所有这些建议，您可以执行以下操作：

def read_indices(fname):
    with open(fname, 'r') as file:
        return [int(index) for index in file]

def read_sequence(fname):
    with open(fname, 'r') as file:
        return ''.join(file)

因为文件是字符串的集合，所以可以在list comprehensions和类似的字符串连接操作中使用它们。剩下的代码现在看起来更干净了：

def gen_insertion_seq(index, seq, gene):
    indices = read_indices(index)
    sequence = read_sequence(seq)
    with open("%s_insertion_seq.txt" % gene, 'w') as output:
        for i in indices:
            site = sequence[i-9:i+15]
            output.write(site + '\n')

gen_insertion_seq('212_index.txt', '212_seq.txt', 'shaker')
gen_insertion_seq('214_index.txt', '214_seq.txt', 'kir2')
gen_insertion_seq('216_index.txt', '216_seq.txt', 'asic1a')
gen_insertion_seq('252_index.txt', '252_seq.txt', 'nachra7')

您的主函数现在更容易理解，因为它只关注序列，而不关注I/O和解析之类的事情。您也不会有一堆打开的文件句柄四处浮动，等待出现错误。实际上，文件操作都是自包含的，远离真正的任务。你知道吗

如果您有文件ID和基因名的序列（in the Python sense），则可以使用循环进一步简化对函数的调用：

for id, gene in zip((212, 214, 216, 252), ('shaker', 'kir2', 'asic1a', 'nachra7')):
    gen_insertion_seq('%d_index.txt' % id, '%d_seq.txt' % id, gene)

另外，Python教程中的I/O部分非常好。关于files的部分你可能特别感兴趣。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章