擅长:python、mysql、java
<p>在本例中,一个简单的正则表达式可以很好地实现这一点。你知道吗</p>
<pre><code>In [1]: text = '''<item rdf:about="http://gh.bmj.com/cgi/content/short/4/4/e001065?rss=1">
...: <title>
...: <![CDATA[
...: Use of routinely collected electronic healthcare data for postlicensure vaccine safety signal det
...: ection: a systematic review
...: ]]>
...: </title>
...: <link>...'''
In [2]: import re
In [3]: re.findall('<dc:identifier>(info:doi.*?)</dc:identifier>', text)
Out[3]: ['info:doi/10.1136/bmjgh-2018-001065']
</code></pre>
<p>如果文本在标记内包含换行符,可以先删除这些换行符:</p>
<pre><code>text = text.replace('\n', '')
</code></pre>
<p>但在这种情况下,这似乎没有必要。你知道吗</p>