使用awk比较和打印两个文件的输出问题的回答

使用awk比较和打印两个文件的输出

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<blockquote> NOTE: This answer is accurate for a previous version of the question. Please check <a href="https://stackoverflow.com/posts/29908010/revisions">the question's revision history</a> for details. </blockquote> <hr/> 如果您在awk中设计这样的进程，您需要考虑的基本问题是，比较两个文件时，其中一个文件的重要部分需要加载到内存中。如果您能确保所使用的内存量不需要使用交换，那么您就可以领先了。：） 所以。。。假设<code>queryfile</code>很小，<code>hitsfile</code>很大，那么您需要这样的东西： <pre><code>$ awk ' # First, store every line of our first file in an array. Simply mentioning # an array element is sufficient, you don't need to assign anything. NR == FNR { a[$0]; next; } # Second, walk through any remaining data (second file, third, etc), # comparing it to elements in the array we stored in the section above. # If the condition here is true, the default action is to print the line. $0 in a ' queryfile hitsfile </code></pre> 这显然可以缩短为一行。你已经知道怎么做了。在 这样做的最终结果是第二个文件中的每一行如果出现在第一个文件中，就会被打印出来。按扩展名，只打印两个文件中出现的行。在 使用您在问题中提供的示例数据，我得到的输出看起来与queryfile相同，因为queryfile的每个条目在hitsfile中出现一次。在 如果这不是您要查找的结果，请提供更详细的解释，也许还有您要查找的示例输出<a href="https://stackoverflow.com/posts/29908010/edit">in your question</a>。在 替代方案： 您可能根本不需要使用awk。在 ^{pr2}$ <code>fgrep</code>命令等效于<code>grep -F</code>，它比较固定字符串而不是正则表达式。<code>-x</code>选项告诉grep只考虑整行，有效地在结尾的开头锚定null，就像regex <code>^...$</code>。并且<code>-f</code>选项表示匹配字符串的列表应该取自指定的文件，在本例中是<code>queryfile</code>。在 最终的结果是，您得到了运行搜索的C代码，而不是awk脚本。我让你做基准测试，因为你有大文件，但我想知道性能差异。在

使用awk比较和打印两个文件的输出

1 个回答

相关Python问题