从临床医生的特定字段中获取数据

def clinicalTrialsGov (nctid): data = BeautifulSoup(requests.get("https://clinicaltrials.gov/ct2/show/" + nctid + "?displayxml=true").text, "xml") subset = ['intervention_type', 'study_type', 'allocation', 'intervention_model', 'primary_purpose', 'masking', 'enrollment', 'official_title', 'condition', 'minimum_age', 'maximum_age', 'gender', 'healthy_volunteers', 'phase', 'primary_outcome', 'secondary_outcome', 'number_of_arms'] tag_matches = data.find_all(subset)

ctOfficial_title: Aerosolized Beta-Agonist Isomers in Asthma ctPhase: Phase 4 ctStudy_type: Interventional ctAllocation: Non-Randomized ctIntervention_model: Crossover Assignment ctPrimary_purpose: Treatment ctMasking: None (Open Label) ctPrimary_outcome: Change in Maximum Forced Expiratory Volume at One Second (FEV1) Baseline (before treatment), 30 minutes, 1, 2, 4, 6, and 8 hours post treatment ctSecondary_outcome: Change in Dyspnea Response as Measured by the University of California, San Diego (UCSD) Dyspnea Scale Baseline (before treatment), 30 minutes, 1, 2, 4, 6, and 8 hours post treatment ctNumber_of_arms: 5 ctEnrollment: 10 ctCondition: Asthma ctIntervention_type: Drug ctGender: All ctMinimum_age: 18 Years ctMaximum_age: N/A ctHealthy_volunteers: No

ctOfficial_title: Aerosolized Beta-Agonist Isomers in Asthma ctPhase: Phase 4 ctStudy_type: Interventional ctAllocation: Non-Randomized ctIntervention_model: Crossover Assignment ctPrimary_purpose: Treatment ctMasking: None (Open Label) ctPrimary_outcome: Change in Maximum Forced Expiratory Volume at One Second (FEV1) Baseline (before treatment), 30 minutes, 1, 2, 4, 6, and 8 hours post treatment ctSecondary_outcome: Change in Dyspnea Response as Measured by the University of California, San Diego (UCSD) Dyspnea Scale Baseline (before treatment), 30 minutes, 1, 2, 4, 6, and 8 hours post treatment ctNumber_of_arms: 5 ctEnrollment: 10 ctCondition: Asthma ctIntervention_type: Drug, Drug, Other, Device, Device, Drug ctGender: All ctMinimum_age: 18 Years ctMaximum_age: N/A ctHealthy_volunteers: No

2条回答

网友

1楼 · 编辑于 2024-09-29 23:19:25

您的代码失败，因为它正在覆盖给定字典键的先前值。相反，您需要附加到现有条目。在

您可以使用Python的defaultdict()。这可以用来为每个键自动创建列表。如果有多个条目，则每个条目都会附加到该关键字的列表中。然后在打印时，如果需要，可以使用,分隔符将列表重新连接在一起：

import bs4
from collections import defaultdict    
from bs4 import BeautifulSoup    
import requests

def clinicalTrialsGov(nctid):
    data = defaultdict(list)
    soup = BeautifulSoup(requests.get("https://clinicaltrials.gov/ct2/show/" + nctid + "?displayxml=true").text, "xml")
    subset = ['intervention_type', 'study_type', 'allocation', 'intervention_model', 'primary_purpose', 'masking', 'enrollment', 'official_title', 'condition', 'minimum_age', 'maximum_age', 'gender', 'healthy_volunteers', 'phase', 'primary_outcome', 'secondary_outcome', 'number_of_arms']

    for tag in soup.find_all(subset):
        data['ct{}'.format(tag.name.capitalize())].append(tag.get_text(strip=True))

    for key in data:
        print('{}: {}'.format(key, ', '.join(data[key])))

clinicalTrialsGov('NCT02170532')

这将显示以下内容：

^{pr2}$

网友

2楼 · 编辑于 2024-09-29 23:19:25

您看到的是最后一个标记值，因为之前的所有值都将被下一个值覆盖。您需要检查字典中是否已存在某个键，如果存在，则句柄也相应。
像这样：

tag_dict = {}
for i in range(0, len(tag_matches)):
    if(str('ct' + tag_matches[i].name.capitalize())) in tag_dict:
         tag_dict[str('ct' + tag_matches[i].name.capitalize())] += ', '+tag_matches[i].text
    else:
         tag_dict[(str('ct' + tag_matches[i].name.capitalize()))]= tag_matches[i].text

相关问题更多 >

编程相关推荐

热门问题

热门文章