回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p><a href="https://i.stack.imgur.com/fAu5x.png" rel="nofollow noreferrer">HTML website</a></p>
<p>我有一个HTML列表,从这个列表中,我只需要具有<code>class=""</code>的<code><tr></code>元素。我想稍后下载这些文件,所以我只需要第三个<code><td></code>元素<code>href</code>和<code><a></code>元素的<code>href</code>之后,如何将它们直接作为字符串读取</p>
<p>我希望所有的<code><tr></code>元素都带有<code>class = ""</code></p>
<p>例如:</p>
<pre><code><tr class="">
<td>29 September, 2021</td>
<td>Antwerp</td>
<td><a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/listings.csv.gz" onclick="var that=this;ga('send','event', 'download','listings',this.href);setTimeout(function(){location.href=that.href;},200);return false;">listings.csv.gz</a></td>
<td>Detailed Listings data for Antwerp</td>
</code></pre>
<p>在这个<code><tr></code>元素的内部有一个<code><td></code>元素。我希望在第三个<code><td></code>元素中包含<code><a></code>元素的href。所以我想要的是url<code>http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/listings.csv.gz</code>(不仅仅是这个:D,我想要所有的url)</p>
<p>代码</p>
<pre class="lang-py prettyprint-override"><code>import requests
from bs4 import BeautifulSoup
from datetime import datetime
DATASET_URL = "http://insideairbnb.com/get-the-data.html"
DATASET_CITY = "Antwerp"
r = requests.get(DATASET_URL)
content = r.content
soup = BeautifulSoup(content, "html.parser")
antwerp_table = soup.find(class_=DATASET_CITY.lower())
print(antwerp_table)
# antwerp_table is my html table
</code></pre>
<p>html示例(有关详细信息,请访问<a href="http://insideairbnb.com/get-the-data.html" rel="nofollow noreferrer">http://insideairbnb.com/get-the-data.html</a>)</p>
<pre class="lang-html prettyprint-override"><code><table class="table table-hover table-striped antwerp">
<thead>
<tr>
<th class="col-md-3" data-field="host_id">Date Compiled</th>
<th class="col-md-3" data-field="host_id">Country/City</th>
<th class="col-md-3" data-field="host_id">File Name</th>
<th class="col-md-3" data-align="right" data-field="count">
Description
</th>
</tr>
</thead>
<tbody>
<tr class="">
<td>29 September, 2021</td>
<td>Antwerp</td>
<td><a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/listings.csv.gz" onclick="var that=this;ga('send','event', 'download','listings',this.href);setTimeout(function(){location.href=that.href;},200);return false;">listings.csv.gz</a></td>
<td>Detailed Listings data for Antwerp</td>
</tr>
<tr class="">
<td>29 September, 2021</td>
<td>Antwerp</td>
<td><a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/calendar.csv.gz" onclick="var that=this;ga('send','event', 'download','calendar',this.href);setTimeout(function(){location.href=that.href;},200);return false;">calendar.csv.gz</a></td>
<td>Detailed Calendar Data for listings in Antwerp</td>
</tr>
...
<tr class="archived">
<td>17 August, 2021</td>
<td>Antwerp</td>
<td><a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/calendar.csv.gz" onclick="var that=this;ga('send','event', 'download','calendar',this.href);setTimeout(function(){location.href=that.href;},200);return false;">calendar.csv.gz</a></td>
<td>Detailed Calendar Data for listings in Antwerp</td>
</tr>
</code></pre>