返回空数据帧组

2024-10-03 13:24:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在使用BeautifulSoup构建一个数据帧,它可以从1001tracklists.com中获取数据

我编写了一个脚本来收集所有曲目信息,并创建了一个数据帧,当我第一次完成它并按预期返回数据帧时,它工作得非常好

然后,我修改了脚本,以迭代列表并打破它-现在数据帧返回空

我不确定哪里出了问题,无法恢复以前的版本。我已经多次重新启动我的内核。我目前正在使用Jupyter笔记本

到目前为止,我的代码是:

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
import numpy as np
import re
import urllib.request
import matplotlib.pyplot as plt

url_bank = []
url_bank.append(marcel_dettman)
url_bank.append(joy_orbison)

#Marcel_dettman and joy_orbison are 2 lists of urls that I couldn't post here bc they were flagged for spam

djs= []
tracknumbers = []
tracknames = []
artistnames = []
mixnames = []
dates = []
url_scrape = []

for url_list in url_bank:
    for url in url_list:
        count = 0
        headers = {'User-Agent': 'Chrome/51.0.2704.103'}
        page_link  = url
        page_response = requests.get(page_link, headers=headers)
        soup = bs(page_response.content, "html.parser")
        title = (page_link[48:-15])
        title = title.replace('-', ' ')
        title = (title[:-1])
        title = title.title()
        title = title.strip('/')
        title = title.strip('K/')
        title = title.strip('1/')
        date = (page_link[-15:-5])

        tracknames_scrape = soup.find_all("div", class_="tlToogleData")
        artistnames_scrape = soup.find_all("meta", itemprop="byArtist")
        for (i, track) in enumerate(tracknames_scrape):
            if track.meta:
                trackname = track.meta['content']
                trackname = trackname.split('- ', 1)[-1]
                tracknames.append(trackname)
                mixnames.append(title)
                dates.append(date)
                djs.append('Joy Orbison')
                url_scrape.append(url)
                count +=1
                tracknumbers.append(count)
            else:
                continue
        for artist in artistnames_scrape:
            artistname = artist["content"]
            artistnames.append(artistname)

df = pd.DataFrame({'DJ Name': djs, 'Date': dates, 'Mix Name': mixnames, 'Track Number': tracknumbers,'Track Name': tracknames, 'Artist Name': artistnames, 'URL':url_scrape})

当前,数据帧返回空的,只有标题。 Stackoverflow不允许我发布URL列表,因此您可以看到我之前关于另一个问题的问题,我必须查看URL是什么样的


Tags: 数据inimporturlfortitleaspage