<p>您需要修改for循环,.attrs用于访问任何标记的属性。如果要排除href值包含特定关键字的链接,请使用<code>!=-1</code>比较</p>
<p>修改代码:</p>
<pre><code>import requests
from bs4 import BeautifulSoup
import random
import time
def scrapeWikiArticle(url):
response = requests.get(
url=url,
)
soup = BeautifulSoup(response.content, 'html.parser')
title = soup.find(id="firstHeading")
allLinks = soup.find(id="bodyContent").find_all("a")
random.shuffle(allLinks)
linkToScrape = 0
for link in allLinks:
if("href" in link.attrs):
if link.attrs['href'].find("/wiki/") == -1 or link.attrs['href'].find("Category:") != -1 or link.attrs['href'].find("File:") != -1 or link.attrs['href'].find("List") != -1:
continue
linkToScrape = link
articleTitles = open("savedArticles.txt", "a+")
articleTitles.write(title.text + ", ")
articleTitles.close()
time.sleep(6)
break
if(linkToScrape):
scrapeWikiArticle("https://en.wikipedia.org" + linkToScrape.attrs['href'])
scrapeWikiArticle("https://en.wikipedia.org/wiki/Anarchism")
</code></pre>