如何使用pythonwikipedia库从wikipedia提取infobox vcard

from bs4 import BeautifulSoup import urllib2 site= "http://en.wikipedia.org/wiki/Aldi" hdr = {'User-Agent': 'Mozilla/5.0'} req = urllib2.Request(site,headers=hdr) page = urllib2.urlopen(req) soup = BeautifulSoup(page) print soup

2条回答

网友

1楼 · 编辑于 2024-09-25 10:18:58

我的解决方案

from bs4 import BeautifulSoup as bs
query = 'albert einstien'
url = 'https://en.wikipedia.org/wiki/'+query
def infobox() :
raw = urllib.urlopen(url)
soup = bs(raw)
table = soup.find('table',{'class':'infobox vcard'})
for tr in table.find_all('tr') :
    print tr.text

网友

2楼 · 编辑于 2024-09-25 10:18:58

基于BeautifulSoup的解决方案：

from bs4 import BeautifulSoup
import urllib2
site= "http://en.wikipedia.org/wiki/Aldi"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page.read())
table = soup.find('table', class_='infobox vcard')
result = {}
exceptional_row_count = 0
for tr in table.find_all('tr'):
    if tr.find('th'):
        result[tr.find('th').text] = tr.find('td').text
    else:
        # the first row Logos fall here
        exceptional_row_count += 1
if exceptional_row_count > 1:
    print 'WARNING ExceptionalRow>1: ', table
print result

在http://en.wikipedia.org/wiki/Aldi上测试，但未在其他wiki页面上完全测试。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用pythonwikipedia库从wikipedia提取infobox vcard

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >