计算行出现次数并除以总行数unix/python问题的回答

计算行出现次数并除以总行数unix/python

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

注意uniq只计算重复的行数，并且必须在其前面加上sort，以便考虑文件中的所有行。对于<code>sort | uniq -c</code>，以下使用<a href="https://docs.python.org/3/library/collections.html#collections.Counter" rel="noreferrer">collections.Counter</a>的代码更有效，因为它根本不需要对任何内容进行排序： <pre><code>from collections import Counter with open('test.in') as inf: counts = sorted(Counter(line.strip('\r\n') for line in inf).items()) total_lines = float(sum(i[1] for i in counts)) for line, freq in counts: print("{}\t{:.4f}".format(line, freq / total_lines)) </code></pre> 此脚本输出 ^{pr2}$ 对于你描述中给出的输入。在 <hr/> 但是，如果您只需要合并连续的行，比如<code>uniq -c</code>，请注意使用<code>Counter</code>的任何解决方案都会给出问题中给出的输出，但是您的<code>uniq -c</code>方法将而不是。<code>uniq -c will be</code>的输出： <pre><code> 1 english<tab>walawala 2 foo bar<tab>laa war 2 hello world<tab>walo lorl 1 foo bar<tab>laa war </code></pre> 不 <pre><code> 1 english<tab>walawala 3 foo bar<tab>laa war 2 hello world<tab>walo lorl </code></pre> 如果这是您想要的行为，您可以使用<a href="https://docs.python.org/3/library/itertools.html#itertools.groupby" rel="noreferrer">^{<cd6>}</a>： <pre><code>from itertools import groupby with open('foo.txt') as inf: grouper = groupby(line.strip('\r\n') for line in inf) items = [ (k, sum(1 for j in i)) for (k, i) in grouper ] total_lines = float(sum(i[1] for i in items)) for line, freq in items: print("{}\t{:.4f}".format(line, freq / total_lines)) </code></pre> 不同之处在于，给定一个<code>test.in</code>包含您指定的内容，uniq管道将而不是生成您在示例中给出的输出，而您将得到： <pre><code>english<tab>walawala<tab>0.1667 foo bar<tab>laa war<tab>0.3333 hello world<tab>walo lorl<tab>0.3333 foo bar<tab>laa war<tab>0.1667 </code></pre> 由于这不是您的输入示例所说的，可能是没有<code>sort</code>就不能使用<code>uniq</code>来解决问题，那么您需要求助于我的第一个示例，Python肯定会比Unix命令行更快。在 <hr/> 顺便说一句，这些功能在所有python&gt；2.6中都是一样的。在

计算行出现次数并除以总行数unix/python

1 个回答

相关Python问题