<p>此<code>perl</code>脚本将生成一个您应该能够使用的哈希。为了方便起见,我使用<a href="https://metacpan.org/pod/List::MoreUtils" rel="nofollow">^{<cd2>}</a>表示<code>uniq</code>,并使用<a href="https://metacpan.org/pod/Data::Printer" rel="nofollow">^{<cd4>}</a>来转储数据结构:</p>
<pre><code>#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use DDP;
my %paper ;
my @categories;
while (<DATA>){
chomp;
my @record = split /\t/ ;
$paper{$record[0]} = { map { $_ => 1 } @record[1..$#record] } ;
push @categories , @record[1..$#record] ;
}
@categories = uniq @categories;
foreach (keys %paper) {
foreach my $category(@categories) {
$paper{$_}{$category} //= 0 ;
}
};
p %paper ;
__DATA__
19801464 Animals Biodiversity Computational Biology/methods DNA
19696045 Environmental Microbiology Computational Biology/methods Software
</code></pre>
<p><strong>输出</strong></p>
^{pr2}$
<p>从那里产生您想要的输出可能需要<code>printf</code>来正确格式化行。以下目的可能足够了:</p>
<pre><code>print "\t", (join " ", @categories);
for (keys %paper) {
print "\n", $_, "\t\t" ;
for my $category(@categories) {
print $paper{$_}{$category}," "x17 ;
}
}
</code></pre>
<hr/>
<p><strong>编辑</strong></p>
<p>一些格式化输出的方法。。。(我们使用<code>x</code>将格式节乘以<code>@categories</code>数组中元素的长度或数量,以便它们匹配):</p>
<p>使用<code>format</code></p>
<pre><code>my $format_line = 'format STDOUT =' ."\n"
. '@# 'x ~~@categories . "\n"
. 'values %{ $paper{$num} }' . "\n"
. '.'."\n";
for $num (keys %paper) {
print $num ;
no warnings 'redefine';
eval $format_line;
write;
}
</code></pre>
<p>使用<code>printf</code>:</p>
<pre><code>print (" "x9, join " ", @categories, "\n");
for $num (keys %paper) {
print $num ;
map{ printf "%19d", $_ } values %{ $paper{$num} } ;
print "\n";
}
</code></pre>
<p>使用<code>form</code>:</p>
<pre><code>use Perl6::Form;
for $num (keys %paper) {
print form
"{<<<<<<<<}" . "{>}" x ~~@categories ,
$num , values %{ $paper{$num} }
}
</code></pre>
<hr/>
<p>根据您计划如何处理数据,您可能能够用perl完成其余的分析工作,因此,在工作流程的后期阶段,打印的精确格式可能不是优先考虑的问题。请参见<a href="http://www.bioperl.org" rel="nofollow">BioPerl</a>获取想法。在</p>