擅长:python、mysql、java
<p>使用最新版本的<code>BeautifulSoup</code>,可以使用伪css选择器(:contains)搜索具有特定文本的标记。然后可以导航到下一个<code>p</code>标记并提取相应的文本:</p>
<pre><code>from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
baseURL = "https://www.genecards.org/cgi-bin/carddisp.pl?gene="
GeneToSearch = input("Gene of Interest: ")
updatedURL = baseURL + GeneToSearch
print(updatedURL)
req = Request(updatedURL, headers={'User-Agent': 'Mozilla/5.0'})
response = urlopen(req).read()
soup = BeautifulSoup(response, 'lxml')
text_find = 'GeneCards Summary for ' + GeneToSearch + ' Gene'
<b>el = soup.select_one('h3:contains("' + text_find + '")')
summary = el.parent.find_next('p').text.strip()</b>
print(summary)
</code></pre>
<p>输出:</p>
<pre><code>IL6 (Interleukin 6) is a Protein Coding gene.
Diseases associated with IL6 include Kaposi Sarcoma and Rheumatoid Arthritis, Systemic Juvenile.
Among its related pathways are IL-1 Family Signaling Pathways and Immune response IFN alpha/beta signaling pathway.
Gene Ontology (GO) annotations related to this gene include signaling receptor binding and growth factor activity.
</code></pre>