Python拆分，字符串为delimi问题的回答

Python拆分，字符串为delimi

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

只需在<code>'N'</code>上拆分字符串，然后删除所有空字符串，或者只包含新行。像这样： <pre><code>#!/usr/bin/env python DNAstring = '''AAACAACAGGGTACAAAGAGTCACGCTTATCCTGTTGATACT TCTCAATGGGCAGTACATATCATCTCTNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNAAAACGTGTGCATGAACAAAAAA CGTAGCAGATCGTGACTGGCTATTGTATTGTGTCAATTTCGCTTCGTCAC TAAATCAACGGACATGTGTTGC''' sequences = [u for u in DNAstring.split('N') if u and u != '\n'] for i, seq in enumerate(sequences): print i print seq.replace('\n', '') + '\n' </code></pre> 输出 ^{pr2}$ 上面的代码片段还使用<code>.replace('\n', '')</code>删除序列中的新行。在 <hr/> 以下是一些你可能会发现有用的程序。在 首先，一个行缓冲类。用文件名和行宽初始化它。然后你可以给它随机长度的字符串，它会自动保存到文本文件，一行一行，所有的行（可能除了最后一行）都有给定的长度。您可以在其他程序中使用这个类，使您的输出看起来整洁。在 将此文件另存为<code>linebuffer.py</code>到Python路径中的某个位置；最简单的方法是在保存Python程序的任何位置保存它，并在运行程序时将其作为当前目录。在 linebuffer.py <pre><code>#! /usr/bin/env python ''' Text output buffer Write fixed width lines to a text file Written by PM 2Ring 2015.03.23 ''' class LineBuffer(object): ''' Text output buffer Write fixed width lines to file fname ''' def __init__(self, fname, width): self.fh = open(fname, 'wt') self.width = width self.buff = [] self.bufflen = 0 def write(self, data): ''' Write a string to the buffer ''' self.buff.append(data) self.bufflen += len(data) if self.bufflen >= self.width: self._save() def _save(self): ''' Write the buffer to the file ''' buff = ''.join(self.buff) #Split buff into lines lines = [] while len(buff) >= self.width: lines.append(buff[:self.width]) buff = buff[self.width:] #Add an empty line so we get a trailing newline lines.append('') self.fh.write('\n'.join(lines)) self.buff = [buff] self.bufflen = len(buff) def close(self): ''' Flush the buffer & close the file ''' if self.bufflen > 0: self.fh.write(''.join(self.buff) + '\n') self.fh.close() def testLB(): alpha = 'abcdefghijklmnopqrstuvwxyz' fname = 'linebuffer_test.txt' lb = LineBuffer(fname, 27) for _ in xrange(30): lb.write(alpha) lb.write(' bye.') lb.close() if __name__ == '__main__': testLB() </code></pre> 这是一个程序，它可以按照你在问题中描述的形式随机生成DNA序列。它使用<code>linebuffer.py</code>来处理输出。我写这个是为了能正确地测试我的DNA序列拆分器。在 随机 <pre><code>#! /usr/bin/env python ''' Make random DNA sequences Sequences consist of random subsequences of the letters 'ACGT' as well as short sequences of 'N', of random length up to 200. Exactly 1000 'N's separate sequence blocks. All sequences may contain newlines chars Takes approx 3 seconds per megabyte generated and saved on a 2GHz CPU single core machine. Written by PM 2Ring 2015.03.23 ''' import sys import random from linebuffer import LineBuffer #Set seed to None to seed randomizer from system time random.seed(37) #Output line width linewidth = 120 #Subsequence base length ranges minsub, maxsub = 15, 300 #Subsequences per sequence ranges minseq, maxseq = 5, 50 #random 'N' sequence ranges minn, maxn = 5, 200 #Probability that a random 'N' sequence occurs after a subsequence randn = 0.2 #Sequence separator nsepblock = 'N' * 1000 def main(): #Get number of sequences from the command line numsequences = int(sys.argv[1]) if len(sys.argv) > 1 else 2 outname = 'DNA_sequence.txt' lb = LineBuffer(outname, linewidth) for i in xrange(numsequences): #Write the 1000*'N' separator between sequences if i > 0: lb.write(nsepblock) for j in xrange(random.randint(minseq, maxseq)): #Possibly make a short run of 'N's in the sequence if j > 0 and random.random() < randn: lb.write(''.join('N' * random.randint(minn, maxn))) #Create a single subsequence r = xrange(random.randint(minsub, maxsub)) lb.write(''.join([random.choice('ACGT') for _ in r])) lb.close() if __name__ == '__main__': main() </code></pre> 最后，我们有一个程序可以分割你的随机DNA序列。它再次使用<code>linebuffer.py</code>来处理输出。在 DNA拆分器0.py <pre><code>#! /usr/bin/env python ''' Split DNA sequences and save to separate files Sequences consist of random subsequences of the letters 'ACGT' as well as short sequences of 'N', of random length up to 200. Exactly 1000 'N's separate sequence blocks. All sequences may contain newlines chars Written by PM 2Ring 2015.03.23 ''' import sys from linebuffer import LineBuffer #Output line width linewidth = 120 #Sequence separator nsepblock = 'N' * 1000 def main(): iname = 'DNA_sequence.txt' outbase = 'contig' with open(iname, 'rt') as f: data = f.read() #Remove all newlines data = data.replace('\n', '') sequences = data.split(nsepblock) #Save each sequence to a series of files for i, seq in enumerate(sequences, 1): outname = '%s%05d' % (outbase, i) print outname #Write sequence data, with line breaks lb = LineBuffer(outname, linewidth) lb.write(seq) lb.close() if __name__ == '__main__': main() </code></pre>

Python拆分，字符串为delimi

1 个回答

相关Python问题