我目前正在使用BeautifulSoup构建一个数据帧,它可以从1001tracklists.com中获取数据
我编写了一个脚本来收集所有曲目信息,并创建了一个数据帧,当我第一次完成它并按预期返回数据帧时,它工作得非常好
然后,我修改了脚本,以迭代列表并打破它-现在数据帧返回空
我不确定哪里出了问题,无法恢复以前的版本。我已经多次重新启动我的内核。我目前正在使用Jupyter笔记本
到目前为止,我的代码是:
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
import numpy as np
import re
import urllib.request
import matplotlib.pyplot as plt
url_bank = []
url_bank.append(marcel_dettman)
url_bank.append(joy_orbison)
#Marcel_dettman and joy_orbison are 2 lists of urls that I couldn't post here bc they were flagged for spam
djs= []
tracknumbers = []
tracknames = []
artistnames = []
mixnames = []
dates = []
url_scrape = []
for url_list in url_bank:
for url in url_list:
count = 0
headers = {'User-Agent': 'Chrome/51.0.2704.103'}
page_link = url
page_response = requests.get(page_link, headers=headers)
soup = bs(page_response.content, "html.parser")
title = (page_link[48:-15])
title = title.replace('-', ' ')
title = (title[:-1])
title = title.title()
title = title.strip('/')
title = title.strip('K/')
title = title.strip('1/')
date = (page_link[-15:-5])
tracknames_scrape = soup.find_all("div", class_="tlToogleData")
artistnames_scrape = soup.find_all("meta", itemprop="byArtist")
for (i, track) in enumerate(tracknames_scrape):
if track.meta:
trackname = track.meta['content']
trackname = trackname.split('- ', 1)[-1]
tracknames.append(trackname)
mixnames.append(title)
dates.append(date)
djs.append('Joy Orbison')
url_scrape.append(url)
count +=1
tracknumbers.append(count)
else:
continue
for artist in artistnames_scrape:
artistname = artist["content"]
artistnames.append(artistname)
df = pd.DataFrame({'DJ Name': djs, 'Date': dates, 'Mix Name': mixnames, 'Track Number': tracknumbers,'Track Name': tracknames, 'Artist Name': artistnames, 'URL':url_scrape})
当前,数据帧返回空的,只有标题。 Stackoverflow不允许我发布URL列表,因此您可以看到我之前关于另一个问题的问题,我必须查看URL是什么样的
目前没有回答
相关问题 更多 >
编程相关推荐