对数据进行子集化，并对每个fi中的行进行计数问题的回答

对数据进行子集化，并对每个fi中的行进行计数

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我试图将数据从一个文件子集到两个单独的文件，并分别计算每个文件中的行数。在 <pre><code> ID,MARK1,MARK2 sire1,AA,BB dam2,AB,AA sire3,AB,- dam1,AA,BB IND4,BB,AB IND5,BB,AA </code></pre> 一个文件是： ^{pr2}$ 另一种是： <pre><code>ID,MARK1,MARK2 IND4,BB,AB IND5,BB,AA </code></pre> 这是我的代码： <pre><code>import re def file_len(filename): with open(filename, mode = 'r', buffering = 1) as f: for i, line in enumerate(f): pass return i inputfile = open("test.txt", 'r') outputfile_f1 = open("f1.txt", 'w') outputfile_f2 = open("f2.txt", 'w') matchlines = inputfile.readlines() outputfile_f1.write(matchlines[0]) #add the header to the "f1.txt" for line in matchlines: if re.match("sire*", line): outputfile_f1.write(line) elif re.match("dam*", line): outputfile_f1.write(line) else: outputfile_f2.write(line) print 'the number of individuals in f1 is:', file_len(outputfile_f1) print 'the number of individuals in f2 is:', file_len(outputfile_f2) inputfile.close() outputfile_f1.close() outputfile_f2.close() </code></pre> 代码可以很好地分离文件的子集，但我特别不喜欢在新文件中添加头的方式，我想知道是否有更好的方法来实现这一点？另外，这个函数在计算行数时看起来很好，但是当我运行它时，它给了我一个错误 <pre><code>"Traceback (most recent call last): File "./subset_individuals_based_on_ID.py", line 28, in <module> print 'the number of individuals in f1 is:', file_len(outputfile_f1) File "./subset_individuals_based_on_ID.py", line 7, in file_len with open(filename, mode = 'r', buffering = 1) as f: TypeError: coercing to Unicode: need string or buffer, file found " </code></pre> 所以我在google上搜索了这个站点，添加了<code>buffering = 1</code>（它最初不在代码中），仍然没有解决问题。在 非常感谢您帮助改进代码并清除错误。在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

对数据进行子集化，并对每个fi中的行进行计数

1 个回答

相关Python问题