使用awk比较和打印两个文件的输出

2条回答

网友

1楼 · 编辑于 2024-10-16 20:47:08

NOTE: This answer is accurate for a previous version of the question. Please check the question's revision history for details.

如果您在awk中设计这样的进程，您需要考虑的基本问题是，比较两个文件时，其中一个文件的重要部分需要加载到内存中。如果您能确保所使用的内存量不需要使用交换，那么您就可以领先了。：）

所以。。。假设queryfile很小，hitsfile很大，那么您需要这样的东西：

$ awk '

  # First, store every line of our first file in an array.  Simply mentioning
  # an array element is sufficient, you don't need to assign anything.

  NR == FNR {
    a[$0];
    next;
  }

  # Second, walk through any remaining data (second file, third, etc),
  # comparing it to elements in the array we stored in the section above.
  # If the condition here is true, the default action is to print the line.

  $0 in a

' queryfile hitsfile

这显然可以缩短为一行。你已经知道怎么做了。在

这样做的最终结果是第二个文件中的每一行如果出现在第一个文件中，就会被打印出来。按扩展名，只打印两个文件中出现的行。在

使用您在问题中提供的示例数据，我得到的输出看起来与queryfile相同，因为queryfile的每个条目在hitsfile中出现一次。在

如果这不是您要查找的结果，请提供更详细的解释，也许还有您要查找的示例输出in your question。在

替代方案：

您可能根本不需要使用awk。在

^{pr2}$

fgrep命令等效于grep -F，它比较固定字符串而不是正则表达式。-x选项告诉grep只考虑整行，有效地在结尾的开头锚定null，就像regex ^...$。并且-f选项表示匹配字符串的列表应该取自指定的文件，在本例中是queryfile。在

最终的结果是，您得到了运行搜索的C代码，而不是awk脚本。我让你做基准测试，因为你有大文件，但我想知道性能差异。在

网友

2楼 · 编辑于 2024-10-16 20:47:08

$ awk 'NR==FNR{a[$1,$2]=$4;next} ($1,$2) in a{print $0, a[$1,$2]}' queryfile hitsfile
chr1 1000 1005 0.5 BDSD
chr1 1010 1015 0.4 SKK1
chr2 1015 1015 0.1 AVPR

相关问题更多 >

编程相关推荐

热门问题

热门文章