广告牌热100刮发

2024-10-02 22:24:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在Python上练习刮削(我是个新手),我遇到了这个问题。我试图从公告牌上的热门100首歌曲列表中获取,结果却不到我所需要的。在

这是密码。如你所见,我把歌曲储存在字典里,然后打印出来。 从lxml导入html 导入请求 页码=请求.get('http://www.billboard.com/charts/hot-100') 树=html.fromstring(第页内容) 广告牌={}

for x in range(1, 51):

currSongY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[4]/div[3]/div/h2/text()'
currArtistY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[4]/div[3]/div/a/text()'

currSongX = tree.xpath(currSongY)
currArtistX = tree.xpath(currArtistY)

if currArtistX == '[]' and currSongX == '[]':
    currSongY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[3]/div[3]/div/h2/text()'
    currArtistY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[3]/div[3]/div/a/text()'
    currSongX = tree.xpath(currSongY)
    currArtistX = tree.xpath(currArtistY)

    if currArtistX == '[]' and currSongX == '[]':
        currSongY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[2]/div[3]/div/h2/text()'
        currArtistY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[2]/div[3]/div/a/text()'
        currSongX = tree.xpath(currSongY)
        currArtistX = tree.xpath(currArtistY)

currSong = str(currSongX)[2:(len(str(currSongX))-2)]
#currArtist = str(currArtistX)[4:(len(str(currArtistX))-4)]
currArtist = str(currArtistX).replace("\\n","")
billboard[x] = (currSong, currArtist)

print (billboard)

结果如下:

^{pr2}$

请帮帮忙!!!!!在


Tags: textdividtreemainarticleh2xpath
1条回答
网友
1楼 · 发布于 2024-10-02 22:24:47

在浏览HTML时,最好让解析器为您完成一些工作;生成元素树并在树中查找标记和属性。在

以下代码适用于billboard 100:

from lxml import etree
from io import StringIO
import requests

page = requests.get('http://www.billboard.com/charts/hot-100')
html = etree.HTML(page.content)

parser = etree.HTMLParser()
tree = etree.parse(StringIO(unicode(etree.tostring(html))), parser)
root = tree.getroot()

billboard = []
for article in root.iter('article'):
    if ('data-songtitle' in article.attrib):
        currSong = article.attrib['data-songtitle']
        for item in article.iter('a'):
            if (('class' in item.attrib) and (item.attrib['class'] == 'chart-row__artist')):
                currArtist = item.text
                billboard.append((currSong.strip(), currArtist.strip()))
                break

for entry in billboard:
    print entry

希望这有帮助。在

相关问题 更多 >