如何跳过<p><h2><a……>而得到d

import urllib from xml.etree.ElementTree import parse # Download the RSS feed and parse it u = urllib.urlopen('http://planet.python.org/rss20.xml') doc = parse(u) # Extract and output tags of interest for item in doc.iterfind('channel/item'): # title = item.findtext('title') # date = item.findtext('pubDate') # link = item.findtext('link') des = item.findtext('description') # print(title) # print(date) # print(link) print(des) print()

1条回答

网友

1楼 · 发布于 2024-09-30 22:22:46

尝试使用BeautifulSoup解析HTML内容如果你只需要文本的话，这样的东西就行了。如果需要HTML内容中的特定信息，可以解析HTML。你知道吗

import urllib
from xml.etree.ElementTree import parse
from bs4 import BeautifulSoup as bs

# Download the RSS feed and parse it
u = urllib.urlopen('http://planet.python.org/rss20.xml')
doc = parse(u)

# Extract and output tags of interest
for item in doc.iterfind('channel/item'):
    des = item.findtext('description')
    if des:
        soup = bs(des)
        text = soup.get_text()
        print(text.encode('utf-8'))

相关问题更多 >

编程相关推荐

热门问题

热门文章