我正在尝试搜集数据来构建一个对象,它看起来像
{
"artist": "Oasis",
"albums": {
"Definitely Maybe": [
"Rock n Roll Star",
"Shakermaker",
...
],
"(What's The Story) Morning Glory": [
"Hello",
"Roll With It"
...
],
...
}
}
Here is how the HTML on the page looks;
我现在正在像这样废弃数据
data = []
for div in soup.find_all("div",{"id":"listAlbum"}):
links = div.findAll('a')
for a in links:
if a.text.strip() is "":
pass
elif a.text.strip():
data.append(a.text.strip())
同样,获取专辑名称也很简单
for div in soup.find_all("div",{"class":"album"}):
titles = div.findAll('b')
for t in titles:
...
我的问题是如何使用上面的两个循环来构建一个类似于顶部的对象。我如何才能确保从X专辑的歌曲,进入正确的专辑对象。如果每首歌都有一个album
属性,我就很清楚了。然而,由于HTML的结构是这样的-我有点不知所措。你知道吗
编辑:查找下面的HTML
<div id="listAlbum">
<a id="1368"></a>
<div class="album">album: <b>"Definitely Maybe"</b> (1994)</div>
<a href="../lyrics/oasis/rocknrollstar.html" target="_blank">Rock 'n' Roll Star</a><br>
<a href="../lyrics/oasis/shakermaker.html" target="_blank">Shakermaker</a><br>
<a href="../lyrics/oasis/liveforever.html" target="_blank">Live Forever</a><br>
<a href="../lyrics/oasis/upinthesky.html" target="_blank">Up In The Sky</a><br>
<a href="../lyrics/oasis/columbia.html" target="_blank">Columbia</a><br>
<a href="../lyrics/oasis/supersonic.html" target="_blank">Supersonic</a><br>
<a href="../lyrics/oasis/bringitondown.html" target="_blank">Bring It On Down</a><br>
<a href="../lyrics/oasis/cigarettesalcohol.html" target="_blank">Cigarettes & Alcohol</a><br>
<a href="../lyrics/oasis/digsysdiner.html" target="_blank">Digsy's Diner</a><br>
<a href="../lyrics/oasis/slideaway.html" target="_blank">Slide Away</a><br>
<a href="../lyrics/oasis/marriedwithchildren.html" target="_blank">Married With Children</a><br>
<a href="../lyrics/oasis/sadsong.html" target="_blank">Sad Song</a><br>
<a id="1366"></a>
<div class="album">album: <b>"(What's The Story) Morning Glory"</b> (1995)</div>
<a href="../lyrics/oasis/hello.html" target="_blank">Hello</a><br>
<a href="../lyrics/oasis/rollwithit.html" target="_blank">Roll With It</a><br>
<a href="../lyrics/oasis/wonderwall.html" target="_blank">Wonderwall</a><br>
<a href="../lyrics/oasis/dontlookbackinanger.html" target="_blank">Don't Look Back In Anger</a><br>
<a href="../lyrics/oasis/heynow.html" target="_blank">Hey Now</a><br>
<a href="../lyrics/oasis/somemightsay.html" target="_blank">Some Might Say</a><br>
<a href="../lyrics/oasis/castnoshadow.html" target="_blank">Cast No Shadow</a><br>
<a href="../lyrics/oasis/sheselectric.html" target="_blank">She's Electric</a><br>
<a href="../lyrics/oasis/morningglory.html" target="_blank">Morning Glory</a><br>
<a href="../lyrics/oasis/champagnesupernova.html" target="_blank">Champagne Supernova</a><br>
<a href="../lyrics/oasis/boneheadsbankholiday.html" target="_blank">Bonehead's Bank Holiday</a><br>
您可以使用^{} 来实现这一点。你知道吗
代码:
输出:
编辑:
检查完网站后,我对代码做了一些修改。你知道吗
首先,您需要跳过这个
<a id="6910"></a>
标记(位于每个专辑的末尾),因为它将添加一首名称为空的歌曲。第二,文本other songs:
不位于<b>
标记内;因此它将引发album_name = tag.b.text
错误。你知道吗执行以下更改将完全满足您的需要。你知道吗
最终输出:
相关问题 更多 >
编程相关推荐