我需要从谷歌专利中获得大量的出版物编号。
我需要的名称示例:US7863316B2、KR102121633B1。
我试图使用经典的Python工具(比如BeautifulSoup)来获取数据,但这种方法在Google上不起作用。然后我去了Google Cloud BigQuery,我得到了一些结果。但是在理解如何使用这个平台之前,我遇到了一个错误:Quota exceeded: Your project exceeded quota for free query bytes scanned.
我用来获取数据的代码:
q = r'''
WITH
pubs as (
SELECT DISTINCT
pub.publication_number
FROM `patents-public-data.patents.publications` pub
INNER JOIN `patents-public-data.google_patents_research.publications` gpr ON
pub.publication_number = gpr.publication_number
WHERE
"epilepsy" IN UNNEST(gpr.top_terms)
AND pub.grant_date < 20000101
)
SELECT
publication_number, url
FROM
`patents-public-data.google_patents_research.publications`
WHERE
publication_number in (SELECT publication_number from pubs)
AND RAND() <= 1000/(SELECT COUNT(*) FROM pubs)
'''
return q
df = client.query(create_query(search_term)).to_dataframe()
if len(df) == 0:
raise ValueError('No results for your search term. Retry with another term.')
else:
print('Search complete for search term: \"{}\". {} random assets selected.'
.format(search_term, len(df)))
embedding_dict = dict(zip(df.publication_number.tolist(),
df.embedding_v1.tolist()))
df.head()```
Probably there are some other ways to get information I need?
目前没有回答
相关问题 更多 >
编程相关推荐