<p>使用正则表达式,您可以在<code><=></code>或<code>+</code>上拆分,以获得带有数字的单独化合物</p>
<p>将它们分开后,可以使用<code>lstrip</code>删除前面的数字(包括<code>(n+1)</code>等),并使用<code>strip</code>删除后面的空格</p>
<pre><code>import re
str1 = 'Polyphosphate + n H2O <=> (n+1) Oligophosphate'
str2 = '16 ATP + 16 H2O + 8 Reduced ferredoxin <=> 8 e- + 16 Orthophosphate + 16 ADP + 8 Oxidized ferredoxin'
res1 = [i.lstrip(" 123456789n()+").strip() for i in re.split(r" \+ | <=> ", str1)]
res2 = [i.lstrip(" 123456789n()+").strip() for i in re.split(r" \+ | <=> ", str2)]
print(res1) # ['Polyphosphate', 'H2O', 'Oligophosphate']
print(res2) # ['ATP', 'H2O', 'Reduced ferredoxin', 'e-', 'Orthophosphate', 'ADP', 'Oxidized ferredoxin']
</code></pre>
<hr/>
<p>随着您不断变化的需求:</p>
<blockquote>
<p>In some compound, it may also exist the number or some other char, for example, '5-Aminolevulinate' or '(+)-Bisdechlorogeodin'</p>
</blockquote>
<p>下面是另一个稍微不太好的解决方案,带有一个额外复杂的示例:</p>
<pre><code>import re
str1 = 'Polyphosphate + n H2O <=> (n+1) Oligophosphate'
str2 = '16 ATP + 16 H2O + 8 Reduced ferredoxin <=> 8 e- + 16 Orthophosphate + 16 ADP + 8 Oxidized ferredoxin'
str3 = '5-Aminolevulinate + 8 Reduced ferredoxin <=> 8 e- + 16 Orthophosphate + (+)-Bisdechlorogeodin + (n+1) Oligophosphate'
res1 = [re.split(r"[^a-z] ", i)[-1].lstrip("n ").strip() for i in re.split(r" \+ | <=> ", str1)]
res2 = [re.split(r"[^a-z] ", i)[-1].lstrip("n ").strip() for i in re.split(r" \+ | <=> ", str2)]
res3 = [re.split(r"[^a-z] ", i)[-1].lstrip("n ").strip() for i in re.split(r" \+ | <=> ", str3)]
print(res1) # ['Polyphosphate', 'H2O', 'Oligophosphate']
print(res2) # ['ATP', 'H2O', 'Reduced ferredoxin', 'e-', 'Orthophosphate', 'ADP', 'Oxidized ferredoxin']
print(res3) # ['5-Aminolevulinate', 'Reduced ferredoxin', 'e-', 'Orthophosphate', '(+)-Bisdechlorogeodin', 'Oligophosphate']
</code></pre>
<hr/>
<p>要处理您现在已删除的评论,并满足进一步的可能要求,请执行以下操作:</p>
<blockquote>
<p>During the experiment, there exist new compounds, for example ''2 GTP <=> Diphosphate + P1,P4-Bis(5'-guanosyl) tetraphosphate'', the compound is 'P1,P4-Bis(5'-guanosyl) tetraphosphate'</p>
</blockquote>
<pre><code>import re
str1 = 'Polyphosphate + n H2O <=> (n+1) Oligophosphate'
str2 = '16 ATP + 16 H2O + 8 Reduced ferredoxin <=> 8 e- + 16 Orthophosphate + 16 ADP + 8 Oxidized ferredoxin'
str3 = '5-Aminolevulinate + 8 Reduced ferredoxin <=> 8 e- + 16 Orthophosphate + (+)-Bisdechlorogeodin + (n+1) Oligophosphate'
str4 = '2 GTP <=> Diphosphate + 8 e- + 16 Orthophosphate + 12 (+)-Bisdechlorogeodin + (n+1) P1,P4-Bis(5\'-guanosyl) tetraphosphate'
res1 = [re.split(r"[^a-z\)]\)? ", i)[-1].lstrip("n ").strip() for i in re.split(r" \+ | <=> ", str1)]
res2 = [re.split(r"[^a-z\)]\)? ", i)[-1].lstrip("n ").strip() for i in re.split(r" \+ | <=> ", str2)]
res3 = [re.split(r"[^a-z\)]\)? ", i)[-1].lstrip("n ").strip() for i in re.split(r" \+ | <=> ", str3)]
res4 = [re.split(r"[^a-z\)]\)? ", i)[-1].lstrip("n ").strip() for i in re.split(r" \+ | <=> ", str4)]
print(res1) # ['Polyphosphate', 'H2O', 'Oligophosphate']
print(res2) # ['ATP', 'H2O', 'Reduced ferredoxin', 'e-', 'Orthophosphate', 'ADP', 'Oxidized ferredoxin']
print(res3) # ['5-Aminolevulinate', 'Reduced ferredoxin', 'e-', 'Orthophosphate', '(+)-Bisdechlorogeodin', 'Oligophosphate']
print(res4) # ['GTP', 'Diphosphate', 'e-', 'Orthophosphate', '(+)-Bisdechlorogeodin', "P1,P4-Bis(5'-guanosyl) tetraphosphate"]
</code></pre>
<p>(注意:我在公式中添加了一些任意的其他内容,以尝试确保在更多情况下生成正确的结果,同时注意,我不一定捕获了所有边缘情况,但它适用于给定的示例。)</p>