<p>所以我已经解决了问题,找到了解决办法</p>
<p>问题是这一行:<code>if 'comment' not in desired_title:</code></p>
<p>它只处理不包含“注释”的HTML。问题是我试图抓取页面上HTML结构的方式,基本上,如果torrent对它有评论,它将显示在HTML结构上,高于标题名。因此,我的代码将完全跳过带有注释的torrents</p>
<p>以下是一个可行的解决方案:</p>
<pre><code>import re, requests
from bs4 import BeautifulSoup
nyaa_link = 'https://nyaa.si/?q=test'
request = requests.get(nyaa_link)
source = request.content
soup = BeautifulSoup(source, 'lxml')
#GETTING TORRENT NAMES
title = []
n = 0
rows = soup.findAll("td", colspan="2")
for row in rows:
if 'comment' in row.find('a')['title']:
desired_title = row.findAll('a', title=True)[1].text
print(desired_title)
title.append(desired_title)
n = n+1
else:
desired_title = row.find('a')['title']
title.append(desired_title)
print(row.find('a')['title'])
print('\n')
#print(title)
#GETTING MAGNET LINKS
magnets = []
for link in soup.findAll('a', attrs={'href': re.compile("^magnet")}):
magnets.append(link.get('href'))
#print(magnets)
#GETTING NUMBER OF MAGNET LINKS AND TITLES
print('Number of rows', len(rows))
print('Number of magnet links', len(magnets))
print('Number of titles', len(title))
print('Number of removed', n)
</code></pre>
<p>感谢<a href="https://stackoverflow.com/users/6023918/cannedscientist">CannedScientist</a>提供解决方案所需的一些代码</p>