<p>我有很多句子的数据,把一个例子作为下面的句子,我想把它分成两个子句子:</p>
<blockquote>
<p>Both whole plasma and the d < 1.006 g/ml density fraction of plasma
from 2/2 mice show this broad beta-migration pattern (Fig. 1 B)
|T:**1SP3E3| ; |I:**1SP3E3| |L:**1SP3E3| in contrast, 3/3 plasma shows
virtually no lipid staining at the beta-position. |T:**1SN3E3|
|I:**1SN3E3| |L:**1SN3E3|</p>
</blockquote>
<p>拆分为:</p>
<blockquote>
<p>Both whole plasma and the d < 1.006 g/ml density fraction of plasma
from 2/2 mice show this broad beta-migration pattern (Fig. 1 B)</p>
</blockquote>
<p>以及</p>
<blockquote>
<p>in contrast, 3/3 plasma shows virtually no lipid staining at the
beta-position.</p>
</blockquote>
<p>我的代码是:</p>
<pre><code>newData =[]
for item in Data:
test2= re.split(r" (?:\|.*?\| ?)+", item[0])
test2 =test2[:-1]
for tx in test2:
newData.append(tx)
print len(newData)
print newData
</code></pre>
<p>但是,结果中有3项,包括<code>;</code>。我检查了原来的句子,发现<code>;</code>在<code>|T:**1SP3E3| ; |I:**1SP3E3|</code>中,所以我需要从结果中删除这个<code>;</code>。我把代码改成了</p>
<pre><code>test2= re.split(r" (?:\|.*?\| ?;?)+", item[0])
</code></pre>
<p>但是我不能得到正确的结果。有人能帮忙吗?谢谢。你知道吗</p>