擅长:python、mysql、java
<p>使用<a href="https://developer.mozilla.org/en-US/docs/Web/CSS/:not" rel="nofollow noreferrer">^{<cd1>}</a>在<code>href</code>和<code>*</code>contains操作符旁边处理排除列表。这将过滤掉包含(<code>*</code>)指定子字符串的<code>hrefs</code>。在前面加一个<a href="https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors" rel="nofollow noreferrer">^{<cd6>}</a>,它包含<code>*</code>{<cd8>}。我通过<code>i</code>为前两个指定了一个不区分大小写的匹配,可以删除:</p>
<pre><code>import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://en.wikipedia.org/wiki/2018_FIFA_World_Cup#Prize_money')
soup = bs(r.content, 'lxml') # 'html.parser'
links = [i['href'] for i in soup.select('#bodyContent a[href*="/wiki/"]:not([href*="Category:" i], [href*="File:" i], [href*="List"])')]
</code></pre>