使用BeautifulSoup从htmldoc提取数据时出现问题

import urllib.request import time from bs4 import BeautifulSoup #Performs a HTTP-'POST' request, passes it to BeautifulSoup and returns the result def doRequest(request): requestResult = urllib.request.urlopen(request) soup = BeautifulSoup(requestResult) return soup def getContactInfoFromPage(page): name = '' straße = '' plz = '' stadt = '' telefon = '' mail = '' url = '' data = [ #'Name', #'Straße', #'PLZ', #'Stadt', #'Telefon', #'E-Mail', #'Homepage' ] request = urllib.request.Request("http://www.altenheim-adressen.de/schnellsuche/" + page) request.add_header("Content-Type", "application/x-www-form-urlencoded;charset=utf-8") request.add_header("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0") soup = doRequest(request) #Save Name to data structure findeName = soup.findAll('b') name = findeName[2] name = name.string.split('>') data.append(name) return soup soup = getContactInfoFromPage("suche2.cfm?id=267a0749e983c7edfeef43ef8e1c7422") print(soup.getText())

1条回答

网友

1楼 · 发布于 2024-09-29 19:28:39

您可以依赖于字段标签来获取next sibling的文本。你知道吗

从中创建一个好的可重用函数将使其更加透明和易于使用：

def get_field_value(soup, field):
    field_label = soup.find('td', text=field + ':')
    return field_label.find_next_sibling('td').get_text(strip=True)

用法：

print(get_field_value(soup, 'Name'))  # prints 'AWO-Seniorenzentrum Kenten'
print(get_field_value(soup, 'Land'))  # prints 'Deutschland'

相关问题更多 >

编程相关推荐

热门问题

热门文章