擅长:python、mysql、java
<p>了解预期结果的总数将有所帮助。在下面,我通过使用:contains以引文h2元素为目标,然后移动到相邻的表来检索25个独特的结果</p>
<pre><code>from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
r = requests.get('https://patents.google.com/patent/US4458945?oq=US4458945A')
soup = bs(r.content, 'lxml')
df = pd.concat([pd.read_html(str(t.find_next('table')))[0]
for t in soup.select('h2:contains("Citations", "Family Cites")')])
df.drop_duplicates(inplace=True)
df.sort_values(by=['Priority date'], inplace=True)
df.reset_index(drop=True, inplace=True)
print(df)
</code></pre>