Python用于在学术领域中匹配纸张id

2条回答

网友

1楼 · 编辑于 2024-09-30 16:23:26

扩展我的评论，您可以使用Pandas groupby实现这一点：

import pandas as pd
from scholarly import scholarly

AuthorList = ['Zoe Pikramenou', 'James H. R. Tucker', 'Alison Rodger', 'Timothy Dafforn']
frames = []

for Author in AuthorList:
    search_query = scholarly.search_author(Author)
    author = next(search_query).fill()
    # creating DataFrame with authors
    df = pd.DataFrame([x.__dict__ for x in author.publications])
    df['author'] = Author
    frames.append(df.copy())

# joining all author DataFrames
df = pd.concat(frames, axis=0)

# taking bib dict into separate columns
df[['title', 'cites', 'year']] = pd.DataFrame(df.bib.to_list())

# counting unique authors attached to each title
n_authors = df.groupby('title').author.nunique()
# locating the unique titles for all publications with n_authors >= 2
output = n_authors[n_authors >= 2].index

这发现了202篇论文，其中有2位或更多的作者在该列表中（在774篇论文中）。以下是一个输出示例：

Index(['1, 1′-Homodisubstituted ferrocenes containing adenine and thymine nucleobases: synthesis, electrochemistry, and formation of H-bonded arrays',
       '722: Iron chelation by biopolymers for an anti-cancer therapy; binding up the'ferrotoxicity'in the colon',
       'A Luminescent One-Dimensional Copper (I) Polymer',
       'A Unidirectional Energy Transfer Cascade Process in a Ruthenium Junction Self-Assembled by r-and-Cyclodextrins',
       'A Zinc(II)-Cyclen Complex Attached to an Anthraquinone Moiety that Acts as a Redox-Active Nucleobase Receptor in Aqueous Solution',
       'A ditopic ferrocene receptor for anions and cations that functions as a chromogenic molecular switch',
       'A ferrocene nucleic acid oligomer as an organometallic structural mimic of DNA',
       'A heterodifunctionalised ferrocene derivative that self-assembles in solution through complementary hydrogen-bonding interactions',
       'A locking X-ray window shutter and collimator coupling to comply with the new Health and Safety at Work Act',
       'A luminescent europium hairpin for DNA photosensing in the visible, based on trimetallic bis-intercalators',
       ...
       'Up-Conversion Device Based on Quantum Dots With High-Conversion Efficiency Over 6%',
       'Vectorial Control of Energy‐Transfer Processes in Metallocyclodextrin Heterometallic Assemblies',
       'Verteporfin selectively kills hypoxic glioma cells through iron-binding and increased production of reactive oxygen species',
       'Vibrational Absorption from Oxygen-Hydrogen (Oi-H2) Complexes in Hydrogenated CZ Silicon',
       'Virginia review of sociology',
       'Wildlife use of log landings in the White Mountain National Forest',
       'Yttrium 1995',
       'ZUSCHRIFTEN-Redox-Switched Control of Binding Strength in Hydrogen-Bonded Metallocene Complexes Stichworter: Carbonsauren. Elektrochemie. Metallocene. Redoxchemie …',
       '[2] Rotaxanes comprising a macrocylic Hamilton receptor obtained using active template synthesis: synthesis and guest complexation',
       'pH-controlled delivery of luminescent europium coated nanoparticles into platelets'],
      dtype='object', name='title', length=202)

由于所有数据都在Pandas中，因此您还可以探索每篇论文的附加作者是什么，以及您可以在来自学术界的author.publications数组中访问的所有其他信息

网友

2楼 · 编辑于 2024-09-30 16:23:26

首先，让我们将其转换为更友好的格式。您说id_citations对于每篇论文都是唯一的，所以我们将使用它作为哈希表/dict键

然后，我们可以将每个id_citation映射到它显示的bib dict和作者，作为元组列表(bib, author_name)

author_list = ['Zoe Pikramenou', 'James H. R. Tucker', 'Alison Rodger', 'Timothy Dafforn']
bibs = {}
for author_name in author_list:
    search_query = scholarly.search_author(author_name)
    for bib in search_query:
        bib = bib.fill()
        bibs.setdefault(bib['id_citations'], []).append((bib, author_name))

此后，我们可以根据附加到bibs中的作者数量对键进行排序：

most_cited = sorted(bibs.items(), key=lambda k: len(k[1]))
# most_cited is now a list of tuples (key, value)
# which maps to (id_citation, [(bib1, author1), (bib2, author2), ...])

和/或将该列表筛选为只有三个或更多外观的引用：

cited_enough = [tup[1][0][0] for tup in most_cited if len(tup[1]) >= 3]
# using key [0] in the middle is arbitrary. It can be anything in the 
# list, provided the bib objects are identical, but index 0 is guaranteed
# to be there.
# otherwise, the first index is to grab the list rather than the id_citation,
# and the last index is to grab the bib, rather than the author_name

现在我们可以从那里检索论文的标题：

paper_titles = [bib['bib']['title'] for bib in cited_enough]

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python用于在学术领域中匹配纸张id

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >