<p>一个基本攻击:</p>
<pre><code>Keep your representation as it is for now.
Initialize a dictionary with the COGs as keys; each value is an initial count of 0.
Now start building your list of enzyme coverage sets (ecs_list), one ecs at a time. Do this by starting at the front of the gene list and working your way to the end, considering all combinations.
Write a recursive routine to solve the remaining COGs in the enzyme. Something like this:
def pick_a_gene(gene_list, cog_list, solution_set, cog_count_dict):
pick the first gene in the list that is in at least one cog in the list.
let the rest of the list be remaining_gene_list.
add the gene to the solution set.
for each of the gene's cogs:
increment the cog's count in cog_count_dict
remove the cog from cog_list (if it's still there).
add the gene to the solution set.
is there anything left in the cog_list?
yes:
pick_a_gene(remaining_gene_list, cog_list, solution_set, cog_count_dict)
no: # we have a solution: check it for minimality
from every non-zero entry in cog_count_dict, subtract 1. This gives us a list of excess coverage.
while the excess list is not empty:
pick the next gene in the solution set, starting from the *end* (if none, break the loop)
if the gene's cogs are all covered by the excess:
remove the gene from the solution set.
decrement the excess count of each of its cogs.
The remaining set of genes is an ECS; add it to ecs_list
</code></pre>
<p>这对你有用吗?我相信它正确地覆盖了最小集,给出了一个表现良好的例子。请注意,从高端开始,当我们检查最小值时,会防范这样的情况:</p>
<pre><code>gene1: cog1, cog5
gene2: cog2, cog5
gene3: cog3
gene4: cog1, cog2, cog4
enzyme: cog1 - cog5
</code></pre>
<p>我们可以看到我们需要gene3、gene4和<em>gene1或gene2。如果我们从低端淘汰掉gene1,就永远找不到解决方案。如果我们从高端开始,我们将消除gene2,但在主循环的稍后过程中找到解决方案。你知道吗</p>
<p>有可能构造一个这样一种情况,即这种情况存在三方冲突。在这种情况下,我们必须在最小值检查中编写一个额外的循环才能找到所有的值。不过,我想你的数据对我们来说并不是那么糟糕。你知道吗</p>