擅长:python、mysql、java
<p>看起来您正在尝试进行模式化文本操作,正则表达式非常适合这种操作。很难从一个例子中概括出来——描述转换越精确,就越容易创建一个正则表达式来实现所需的功能。关于正则表达式的Python文档是一个有用的参考:<a href="https://docs.python.org/3/library/re.html" rel="nofollow noreferrer">https://docs.python.org/3/library/re.html</a></p>
<p>如果我必须从您的示例和描述中归纳出一个模式,我将精心设计以下正则表达式:</p>
<pre class="lang-py prettyprint-override"><code>import re
myre = re.compile(
r'([A-Za-z]+_[\d]+)' # This will match "scaffold_356" in the first group
r'_[\d]+-[\d]+_\+_' # This will match "_1-1000_+_" ungrouped
r'(_[A-Za-z]{3})' # This will match _Gen and put it in the second group
r'[A-Za-z]*' # This will match any additional letters, ungrouped
r'(_[A-Za-z]{3})' # This will match _Gen and put it in the third group
)
</code></pre>
<p>如果尝试使用此正则表达式,则可以看到它会将要构造的部分提取到最终结果中:</p>
<pre class="lang-py prettyprint-override"><code>matches = myre.match('scaffold_356_1-1000_+__Genus_species')
print(''.join(matches)) # prints scaffold_356_Gen_spe
</code></pre>
<p>当然,这个正则表达式只适用于非常特定的模式,如果不严格遵守该模式,它将是不可原谅的。你知道吗</p>