<p>使用GNU sed:</p>
<pre><code>$ sed -r ':a;N;s/^([^ ]*)( .*)\n\1(.*)$/\1\2 |\3/;ta;P;D' infile
TRINITY_GG_428_c0_g1_i1_orf1 PF13499.1 EF_hand_5 | PF00036.27 efhand | PF13405.1 EF_hand_4 | PF13833.1 EF_hand_6 | PF13202.1 EF_hand_3
TRINITY_GG_429_c0_g1_i1_orf1 PF00156.22 Pribosyltran
TRINITY_GG_431_c5_g1_i1_orf1 PF00475.13 IGPD
TRINITY_GG_461_c0_g1_i1_orf1 PF01208.12 URO-D | PF12876.2 Cellulase-like
</code></pre>
<p>主要的部分是替换:它检查两行是否以相同的字符串开头(直到第一个空格),如果是,则连接这些行,从第二行删除字符串并用管道替换新行。在</p>
<p>拆分:</p>
^{pr2}$
<p>要使用BSD sed执行此操作,我们必须围绕标签拆分命令,并使用<code>-E</code>标志而不是<code>-r</code>:</p>
<pre><code>sed -E -e ':a' -e 'N;s/^([^ ]*)( .*)\n\1(.*)$/\1\2 |\3/;ta' -e 'P;D' infile
</code></pre>
<p>为了更好地衡量,我们可以更仔细地观察一下这种替代:</p>
<pre><code>s/ # Start substitution
^ # Anchor at start of pattern space
([^ ]*) # Match and capture non-space characters (group #1)
( .*) # Capture up to end of line (group #2)
\n # Match newline
\1 # Start of second line: match first capture group
(.*) # Capture rest of second line (group #3)
$ # Anchor at end of pattern space
/ # Delimiter for substitution
\1\2 |\3 # Substitute: captures groups 1 and 2, space, pipe, capture group 3
/ # End of substitution
</code></pre>