从Python刮汤的内容准确

2024-09-30 01:26:54 发布

您现在位置:Python中文网/ 问答频道 /正文

在python中使用beautiful soup,我希望能够从在线可排序表中抓取特定的文本<a>/numbers<td>。在

http://www.nfl.com/stats/categorystats?archive=false&conference=null&role=OPP&offensiveStatisticCategory=null&defensiveStatisticCategory=INTERCEPTIONS&season=2014&seasonType=REG&tabSeq=2&qualified=false&Submit=Go

我已经试过一百万次了,但还是搞不懂。在

这是我能做的最好的:

from bs4 import BeautifulSoup
import urllib2
import requests
import pymongo
import re

soup = BeautifulSoup(urllib2.urlopen('http://www.nfl.com/stats/categorystats?archive=false&conference=null&role=OPP&offensiveStatisticCategory=null&defensiveStatisticCategory=INTERCEPTIONS&season=2014&seasonType=REG&tabSeq=2&qualified=false&Submit=Go').read())

find = soup('a', text="Miami Dolphins")

print find

我不知道如何找到/调用迈阿密海豚后的第10个(python中的第9个)标签。在

表代码如下所示:

^{pr2}$

Tags: importcomfalsehttpwwwstatsnullrole
1条回答
网友
1楼 · 发布于 2024-09-30 01:26:54

试试这个

import urllib2
from lxml import etree

url = 'http://www.nfl.com/stats/categorystats?archive=false&conference=null&role=OPP&offensiveStatisticCategory=null&defensiveStatisticCategory=INTERCEPTIONS&season=2014&seasonType=REG&tabSeq=2&qualified=false&Submit=Go'
response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()

tree = etree.parse(response,htmlparser)

text = tree.xpath('//a[contains(text(),"Miami Dolphins")]/parent::td/following-sibling::td[10]/text()')
if text:
    print text[0].strip()

相关问题 更多 >

    热门问题