我一直在Python上练习刮削(我是个新手),我遇到了这个问题。我试图从公告牌上的热门100首歌曲列表中获取,结果却不到我所需要的。在
这是密码。如你所见,我把歌曲储存在字典里,然后打印出来。 从lxml导入html 导入请求 页码=请求.get('http://www.billboard.com/charts/hot-100') 树=html.fromstring(第页内容) 广告牌={}
for x in range(1, 51):
currSongY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[4]/div[3]/div/h2/text()'
currArtistY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[4]/div[3]/div/a/text()'
currSongX = tree.xpath(currSongY)
currArtistX = tree.xpath(currArtistY)
if currArtistX == '[]' and currSongX == '[]':
currSongY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[3]/div[3]/div/h2/text()'
currArtistY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[3]/div[3]/div/a/text()'
currSongX = tree.xpath(currSongY)
currArtistX = tree.xpath(currArtistY)
if currArtistX == '[]' and currSongX == '[]':
currSongY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[2]/div[3]/div/h2/text()'
currArtistY = '//*[@id="main"]/div[2]/div/div[1]/article[' + str(x) + ']/div[1]/div[2]/div[3]/div/a/text()'
currSongX = tree.xpath(currSongY)
currArtistX = tree.xpath(currArtistY)
currSong = str(currSongX)[2:(len(str(currSongX))-2)]
#currArtist = str(currArtistX)[4:(len(str(currArtistX))-4)]
currArtist = str(currArtistX).replace("\\n","")
billboard[x] = (currSong, currArtist)
print (billboard)
结果如下:
^{pr2}$请帮帮忙!!!!!在
在浏览HTML时,最好让解析器为您完成一些工作;生成元素树并在树中查找标记和属性。在
以下代码适用于billboard 100:
希望这有帮助。在
相关问题 更多 >
编程相关推荐