用lxml解析pubmedapixml，然后将子元素抓取到字典中

{'29150897': {'title': 'Determining best outcomes from community-acquired pneumonia and how to achieve them.'} '29149862': {'title': 'Telemedicine as an effective intervention to improve antibiotic appropriateness prescription and to reduce costs in pediatrics.'}}

{'2725403628806902': {'title': 'Handshake Stewardship: A Highly Effective Rounding-based Antimicrobial Optimization Service.Monitoring, documenting and reporting the quality of antibiotic use in the Netherlands: a pilot study to establish a national antimicrobial stewardship registry.'}}

2条回答

网友

1楼 · 编辑于 2024-09-28 20:53:12

代码.py：

#!/usr/bin/env python3

import sys
import requests
from lxml import etree
from pprint import pprint as pp

ARTICLE_URL = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&tool=PMA&id=29150897,29149862"


def main():
    response = requests.get(ARTICLE_URL)
    tree = etree.fromstring(response.content)
    ids = tree.xpath("//MedlineCitation/PMID[@Version='1']")
    titles = tree.xpath("//Article/ArticleTitle")
    if len(ids) != len(titles):
        print("ID count doesn't match Title count...")
        return
    result = {_id.text: {"title": title.text} for _id, title in zip(ids, titles)}
    pp(result)


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

注意事项：

为了清晰起见，我对代码进行了一点结构化，并重命名了一些变量
ids保存PMID节点的列表，而titles则保存（对应的）articletTitle节点的列表（注意路径！）在
以所需格式将它们连接在一起的方法是使用[Python]: PEP 274 Dict Comprehensions，为了同时迭代2个列表，使用了[Python 3]: zip(*iterables)

输出：

(py35x64_test) c:\Work\Dev\StackOverflow\q47433632>"c:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

{'29149862': {'title': 'Telemedicine as an effective intervention to improve '
                       'antibiotic appropriateness prescription and to reduce '
                       'costs in pediatrics.'},
 '29150897': {'title': 'Determining best outcomes from community-acquired '
                       'pneumonia and how to achieve them.'}}

网友

2楼 · 编辑于 2024-09-28 20:53:12

首先，xml是case-sensitive，在xpath中使用小写标记。在

另外，我认为pmid应该是某个数字（或代表数字的字符串），在您的例子中，这似乎是不同的：

在我的测试中

`pmid = ''.join([x.text for x in x.xpath('//MedlineCitation/PMID[@Version="1"]')])`

生成串联数字的字符串，这不是您要查找的。在

相关问题更多 >

编程相关推荐

热门问题

热门文章