从pubmed中搜索作者从属关系以获取pubmed id和doi的列表
pubmed-author-affiliation的Python项目详细描述
命令行工具获取一个或一个pubmed id或doi列表, 在PubMed中搜索相应的作者从属关系和 将信息输出到文件
命令行选项
-h, --help | Show help text |
-i PUBMEDID, --pubmedid PUBMEDID | |
Search for author affiliations for single Pubmed ID | |
-d doi, --doi doi | |
Search for author affiliations for a single DOI | |
-f file, --infile file | |
File with a list of Pubmed IDs and DOIs (they can be mixed). One entry per line. | |
-x format, --format format | |
Output format. Choices=[‘json’ (default),’text’]. ‘text’ option produces tab separated table, denormalised in the sense that the pubmed ID/DOI is repeated on multiple rows if there are multiple authors with related affiliations. |
示例运行
pubmed输入和json输出:
python pubmedAuthorAffiliation.py -i 27863242
输出:
{'articleTitle': 'Decoding Mammalian Ribosome-mRNA States by Translational GTPase Complexes.', 'journalTitle': 'Cell', 'authorList': [{'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Shao', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK.', 'country': 'UK', 'initials': 'S'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Murray', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK.', 'country': 'UK', 'initials': 'J'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Brown', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK.', 'country': 'UK', 'initials': 'A'}, {'firstName': 'n/a', 'institute': 'University of California', 'lastName': 'Taunton', 'affiliation': 'Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA.', 'country': 'USA', 'initials': 'J'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Ramakrishnan', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK. Electronic address: ramak@mrc-lmb.cam.ac.uk.', 'country': 'UK', 'initials': 'V'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Hegde', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK. Electronic address: rhegde@mrc-lmb.cam.ac.uk.', 'country': 'UK', 'initials': 'RS'}], 'pubmedId': '27863242', 'error': False} {'articleTitle': 'Decoding Mammalian Ribosome-mRNA States by Translational GTPase Complexes.', 'journalTitle': 'Cell', 'authorList': [{'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Shao', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK.', 'country': 'UK', 'initials': 'S'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Murray', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK.', 'country': 'UK', 'initials': 'J'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Brown', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK.', 'country': 'UK', 'initials': 'A'}, {'firstName': 'n/a', 'institute': 'University of California', 'lastName': 'Taunton', 'affiliation': 'Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA.', 'country': 'USA', 'initials': 'J'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Ramakrishnan', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK. Electronic address: ramak@mrc-lmb.cam.ac.uk.', 'country': 'UK', 'initials': 'V'}, {'firstName': 'n/a', 'institute': 'MRC-LMB', 'lastName': 'Hegde', 'affiliation': 'MRC-LMB, Francis Crick Avenue, Cambridge CB2 0QH, UK. Electronic address: rhegde@mrc-lmb.cam.ac.uk.', 'country': 'UK', 'initials': 'RS'}], 'pubmedId': '27863242', 'error': False}
DOI输入和文本输出:
python pubmedAuthorAffiliation.py -d 10.1016/j.molcel.2016.11.013 -x text
输出:
27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a L Tafur European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Germany European Molecular Biology Laboratory (EMBL) 27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a Y Sadian European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Germany European Molecular Biology Laboratory (EMBL) 27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a NA Hoffmann European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Germany European Molecular Biology Laboratory (EMBL) 27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a AJ Jakobi European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany; European Molecular Biology Laboratory (EMBL), Hamburg Unit, Notkestrasse 85, 22607 Hamburg, Germany. Germany European Molecular Biology Laboratory (EMBL) 27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a R Wetzel European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Germany European Molecular Biology Laboratory (EMBL) 27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a WJH Hagen European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Germany European Molecular Biology Laboratory (EMBL) 27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a C Sachse European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Germany European Molecular Biology Laboratory (EMBL) 27867008 Molecular cell Molecular Structures of Transcribing RNA Polymerase I. n/a CW Müller European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Electronic address: cmueller@embl.de. Germany European Molecular Biology Laboratory (EMBL)
混合了doi和pubmed的输入文件。写入文件的文本输出:
python pubmedAuthorAffiliation.py -f emdb-2010.txt -x text > /tmp/out.txt
在这种情况下,将忽略无法识别的行,例如:
WARNING:root:processList: id not recognized: id
代码测试
这将遍历选定的pubmed和已知工作的doi列表:
python test_pubmedAuthorAffiliation.py