<p>此代码查找字符串中有多少个<code>GC</code>或<code>CG</code>组合,并将值在30-50%之间的组合过滤到输出数组</p>
<p>我还打印了为不同测试用例计算的百分比,供您参考</p>
<p><strong>代码</strong>:</p>
<pre class="lang-py prettyprint-override"><code>import regex as re
siRNAs=['GUUUCCCTTTG', 'GCTTTUGCTUT', 'GCTUGCUTGCU', 'CGTUCGUTCGU', 'GCTUCGUTCGU', 'CGCGTUUTCGU', 'GCGCTUUTGCU',
'GCGCGCGCTUUTGCU', 'GCGCGCGCCGCGCGTUUTGCU' ]
def get_count(mstring, sub1, sub2):
idxs1 = [(m.start(), m.end()) for m in re.finditer(sub1, mstring)]
idxs2 = [(m.start(), m.end()) for m in re.finditer(sub2, mstring)]
count = len(idxs1)
for i2 in idxs2:
if any([i1[0] <= i2[0] < i1[1] for i1 in idxs1]):
continue
count+=1
return count
for x in siRNAs:
print('siRNA: ', x, ' percentage: ',((get_count(x, "GC", "CG")) * 2) / len(x) * 100, '%')
output = [x for x in siRNAs if 30 <= ((get_count(x, "GC", "CG")) * 2) / len(x) * 100 <=50]
print('output: ', output)
</code></pre>
<p><strong>输入</strong>:</p>
<pre><code>['GUUUCCCTTTG', 'GCTTTUGCTUT', 'GCTUGCUTGCU', 'CGTUCGUTCGU', 'GCTUCGUTCGU', 'CGCGTUUTCGU', 'GCGCTUUTGCU', 'GCGCGCGCTUUTGCU', 'GCGCGCGCCGCGCGTUUTGCU']
</code></pre>
<p><strong>输出</strong>:</p>
<pre><code>siRNA: GUUUCCCTTTG percentage: 0.0 %
siRNA: GCTTTUGCTUT percentage: 36.36363636363637 %
siRNA: GCTUGCUTGCU percentage: 54.54545454545454 %
siRNA: CGTUCGUTCGU percentage: 54.54545454545454 %
siRNA: GCTUCGUTCGU percentage: 54.54545454545454 %
siRNA: CGCGTUUTCGU percentage: 54.54545454545454 %
siRNA: GCGCTUUTGCU percentage: 54.54545454545454 %
siRNA: GCGCGCGCTUUTGCU percentage: 66.66666666666666 %
siRNA: GCGCGCGCCGCGCGTUUTGCU percentage: 76.19047619047619 %
output: ['GCTTTUGCTUT']
</code></pre>