擅长:python、mysql、java
<p>您可以从url下载带有<code>urllib.request.urlretrieve</code>的rss文件,然后使用<a href="https://docs.python.org/3.7/library/xml.dom.minidom.html" rel="nofollow noreferrer">minidom</a>首先删除不需要的<strong>dc:identifier</strong>。之后,您可以使用feedparser访问所需的值。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>from xml.dom import minidom
from urllib import request
import feedparser
request.urlretrieve("https://gh.bmj.com/rss/recent.xml", "recent.xml")
xmldoc = minidom.parse('recent.xml')
itemlist = xmldoc.getElementsByTagName('dc:identifier')
for item in itemlist:
if item.firstChild.nodeValue.startswith("hwp:"):
p = item.parentNode
p.removeChild(item)
file_handle = open("recent_modified.xml","w+")
xmldoc.writexml(file_handle)
file_handle.close()
d = feedparser.parse('recent_modified.xml')
for item in d.entries:
print(item.dc_identifier)
</code></pre>