所以,我想做的是创建一个Python函数,它允许我传递我想下载的podcast的年、月、日。然后它将解析HTML并返回当天播客的链接。例如:
>>> get_download_links(year, month, day)
['https://www.tytnetwork.com/?tytpm=44279&type=audio', # Hr 1 (audio)
'https://www.tytnetwork.com/?tytpm=44277&type=audio'] # Hr 2 (audio)
我试图解析的页面是http://www.tytnetwork.com/annual-archives/2014-main-show-archives/
以下是每月第一周的示例(包括工作日标签):
<tr>
<th class="tytca-mosname" colspan="5">
<h3>
June 2014
</h3>
</th>
</tr>
<tr>
<th class="tytca-dayname">
<h3>
Mon
</h3>
</th>
<th class="tytca-dayname">
<h3>
Tue
</h3>
</th>
<th class="tytca-dayname">
<h3>
Wed
</h3>
</th>
<th class="tytca-dayname">
<h3>
Thu
</h3>
</th>
<th class="tytca-dayname">
<h3>
Fri
</h3>
</th>
</tr>
<tr>
<td class="tytca-td">
<div class="tytca-daynum">
2
</div>
<p>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=42848&type=audio" title="Click to download audio file">
Hr 1
</a>
<br/>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=42851&type=audio" title="Click to download audio file">
Hr 2
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=42848&type=video" title="Click to download video file">
Hr 1
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=42851&type=video" title="Click to download video file">
Hr 2
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/02/tyt-june-2-2014-hour-1/" title="Click to watch the video">
Hr 1
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/02/tyt-june-2-2014-hour-2/" title="Click to watch the video">
Hr 2
</a>
</p>
</td>
<td class="tytca-td">
<div class="tytca-daynum">
3
</div>
<p>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=43325&type=audio" title="Click to download audio file">
Hr 1
</a>
<br/>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=43324&type=audio" title="Click to download audio file">
Hr 2
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=43325&type=video" title="Click to download video file">
Hr 1
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=43324&type=video" title="Click to download video file">
Hr 2
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/03/tyt-june-3-2014-hour-1/" title="Click to watch the video">
Hr 1
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/03/tyt-june-3-2014-hour-2/" title="Click to watch the video">
Hr 2
</a>
</p>
</td>
<td class="tytca-td">
<div class="tytca-daynum">
4
</div>
<p>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=43635&type=audio" title="Click to download audio file">
Hr 1
</a>
<br/>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=43633&type=audio" title="Click to download audio file">
Hr 2
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=43635&type=video" title="Click to download video file">
Hr 1
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=43633&type=video" title="Click to download video file">
Hr 2
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/04/tyt-june-4-2014-hour-1/" title="Click to watch the video">
Hr 1
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/04/tyt-june-4-2014-hour-2/" title="Click to watch the video">
Hr 2
</a>
</p>
</td>
<td class="tytca-td">
<div class="tytca-daynum">
5
</div>
<p>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=44046&type=audio" title="Click to download audio file">
Hr 1
</a>
<br/>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=44044&type=audio" title="Click to download audio file">
Hr 2
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=44046&type=video" title="Click to download video file">
Hr 1
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=44044&type=video" title="Click to download video file">
Hr 2
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/05/tyt-june-5-2014-hour-1/" title="Click to watch the video">
Hr 1
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/05/tyt-june-5-2014-hour-2/" title="Click to watch the video">
Hr 2
</a>
</p>
</td>
<td class="tytca-td">
<div class="tytca-daynum">
6
</div>
<p>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=44279&type=audio" title="Click to download audio file">
Hr 1
</a>
<br/>
<a class="tytca-audio" href="https://www.tytnetwork.com/?tytpm=44277&type=audio" title="Click to download audio file">
Hr 2
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=44279&type=video" title="Click to download video file">
Hr 1
</a>
<br/>
<a class="tytca-video" href="https://www.tytnetwork.com/?tytpm=44277&type=video" title="Click to download video file">
Hr 2
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/06/tyt-june-6-2014-hour-1/" title="Click to watch the video">
Hr 1
</a>
<br/>
<a class="tytca-video-watch" href="https://www.tytnetwork.com/2014/06/06/tyt-june-6-2014-hour-2/" title="Click to watch the video">
Hr 2
</a>
</p>
</td>
</tr>
我试过使用beautiful soup,但问题是页面结构太差,似乎没有办法实现我想要的。你知道吗
在这一点上,我把这个交给这里的Python大师来帮助我。你知道吗
相关问题 更多 >
编程相关推荐