Python中的靓汤在某些wikipages中无法恢复属性

2024-10-01 04:47:42 发布

您现在位置:Python中文网/ 问答频道 /正文

它确实得到了Link1的坐标,但是同样的代码得到了另一个国家的坐标…为什么?(一年前效果不错)

link1='http://en.wikipedia.org/wiki/Ethiopia'
link2='http://en.wikipedia.org/wiki/Russia'

identification = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
requestToServer2 = urllib2.Request(link2,headers=identification)
responseFromServerInHTML2 = urllib2.urlopen(requestToServer2)

subSoup = BS(responseFromServerInHTML2,'lxml')
coords=subSoup.find_all("span",{'class': 'geo'})[0].string

print coords

Tags: 代码orghttpwikicoords国家urllib2wikipedia
1条回答
网友
1楼 · 发布于 2024-10-01 04:47:42

有两个geo类,findall可以同时获得:

link2='http://en.wikipedia.org/wiki/Russia'
import requests

from bs4 import BeautifulSoup
r = requests.get(link2)
soup = BeautifulSoup(r.content)
coords=soup.find_all("span",{'class': 'geo'})
print(coords)

[<span class="geo">60; 90</span>, <span class="geo">55.750; 37.617</span>]

如果需要第二个元素,只需访问第二个元素:

print(coords[1].text)
55.750; 37.617

您可以使用colspan查找第二个:

from bs4 import BeautifulSoup
r = requests.get(link2)
soup = BeautifulSoup(r.content)
print(soup.find("td",{'colspan': "2"})).find_next("span",{'class': 'geo'}).text
 55.750; 37.617

相关问题 更多 >