靓汤：从没有id的<ul>访问<li>元素

hdr = {'User-Agent': 'Mozilla/5.0'} site = "http://en.wikipedia.org/wiki/"+"january"+"_"+"1" req = urllib2.Request(site,headers=hdr) page = urllib2.urlopen(req) soup = BeautifulSoup(page) print soup

2条回答

网友

1楼 · 编辑于 2024-05-11 13:22:43

找到出生部分：

section = soup.find('span', id='Births').parent

然后找到下一个无序列表：

births = section.find_next('ul').find_all('li')

网友
2楼 · 编辑于 2024-05-11 13:22:43

其思想是用Birthsid获取span，找到父代的下一个兄弟（即ul）并遍历它的li元素。下面是一个使用requests的完整示例（但与此无关）：
from bs4 import BeautifulSoup as Soup, Tag import requests response = requests.get("http://en.wikipedia.org/wiki/January_1") soup = Soup(response.content) births_span = soup.find("span", {"id": "Births"}) births_ul = births_span.parent.find_next_sibling() for item in births_ul.findAll('li'): if isinstance(item, Tag): print item.text
印刷品：
871 – Zwentibold, Frankish son of Arnulf of Carinthia (d. 900) 1431 – Pope Alexander VI (d. 1503) 1449 – Lorenzo de' Medici, Italian politician (d. 1492) 1467 – Sigismund I the Old, Polish king (d. 1548) 1484 – Huldrych Zwingli, Swiss pastor and theologian (d. 1531) 1511 – Henry, Duke of Cornwall (d. 1511) 1516 – Margaret Leijonhufvud, Swedish wife of Gustav I of Sweden (d. 1551) ...
希望能有所帮助。

相关问题更多 >

编程相关推荐

热门问题

热门文章