网页抓取维基百科页面

1条回答

网友

1楼 · 发布于 2024-09-29 19:32:21

这是你想要的吗？在

import requests
rs = requests.get('https://en.wikipedia.org/wiki/Diglossia', verify=False)
parsed_html = BeautifulSoup(rs.text)
print parsed_html.body.findAll('p')[0].findAll('a')[0]

这样可以得到：

^{pr2}$

如果要提取href，则可以使用以下命令：

parsed_html.body.findAll('p')[0].findAll('a')[0].attrs[0][1]

更新似乎您需要的是括号后的href，而不是前面的括号。我已经为它写了剧本。试试这个：

import requests
from BeautifulSoup import BeautifulSoup
rs = requests.get('https://en.wikipedia.org/wiki/Diglossia', verify=False)
parsed_html = BeautifulSoup(rs.text)

temp = parsed_html.body.findAll('p')[0]

start_count = 0
started = False
found = False

while temp.next and found is False:
    temp = temp.next
    if '(' in temp:
        start_count += 1
        if started is False:
            started = True
    if ')' in temp and started and start_count > 1:
        start_count -= 1
    elif ')' in temp and started and start_count == 1:
        found = True

print temp.findNext('a').attrs[0][1]

编辑

相关问题更多 >

编程相关推荐

热门问题

热门文章

网页抓取维基百科页面

编辑

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >