<p>这就是我获取标记的方式</p>
<pre class="lang-py prettyprint-override"><code>from bs4 import BeautifulSoup
import requests
URL = "https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418"
brands = []
products = []
sizes = []
upcs = []
codes = []
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
recall_details = soup.find("table", class_="table table-bordered table-condensed")
body = recall_details.find("tbody")
rows = body.find_all("tr")
for row in rows:
data = row.find_all("td")
brands.append(data[0].text)
products.append(data[1].text)
sizes.append(data[2].text)
upcs.append(data[3].text)
codes.append(data[4].text)
</code></pre>
<p>印刷品</p>
<pre class="lang-shell prettyprint-override"><code>['One Ocean']
['Sliced Smoked Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']
</code></pre>
<hr/>
<p>我确实认为dict是比多个列表更好的数据结构,但当然这取决于您的用例</p>
<p>如果您想这样做,可以如下更改代码:</p>
<pre class="lang-py prettyprint-override"><code>
recalled = []
...
for row in rows:
data = row.find_all("td")
item = {
"brand": data[0].text,
"products": data[1].text,
"sizes": data[2].text,
"upcs": data[3].text,
"codes": data[4].text,
}
recalled.append(item)
</code></pre>
<p>印刷品</p>
<pre class="lang-shell prettyprint-override"><code>[{'brand': 'One Ocean', 'products': 'Sliced Smoked Wild Sockeye Salmon', 'sizes': '300\xa0g', 'upcs': '6\xa025984\xa000005\xa03', 'codes': '11253'}]
</code></pre>