刮削时处理按键错误

def clinicalTrialsGov (id): url = "https://clinicaltrials.gov/ct2/show/" + id + "?displayxml=true" data = BeautifulSoup(requests.get(url).text, "lxml") studyType = data.study_type.text if studyType == 'Interventional': allocation = data.allocation.text interventionModel = data.intervention_model.text primaryPurpose = data.primary_purpose.text masking = data.masking.text enrollment = data.enrollment.text officialTitle = data.official_title.text condition = data.condition.text minAge = data.eligibility.minimum_age.text maxAge = data.eligibility.maximum_age.text gender = data.eligibility.gender.text healthyVolunteers = data.eligibility.healthy_volunteers.text armType = [] intType = [] for each in data.findAll('intervention'): intType.append(each.intervention_type.text) for each in data.findAll('arm_group'): armType.append(each.arm_group_type.text) citedPMID = tryExceptCT(data, '.results_reference.PMID') citedPMID = data.results_reference.PMID print(citedPMID) return officialTitle, studyType, allocation, interventionModel, primaryPurpose, masking, enrollment, condition, minAge, maxAge, gender, healthyVolunteers, armType, intType

1条回答

网友

1楼 · 发布于 2024-07-04 06:57:29

这是个好问题。在我处理它之前，让我说您应该考虑将BeautifulSoup（BS）构造函数的第二个参数从lxml更改为xml。否则，BS不会将解析后的标记标记为XML（要自己验证这一点，请访问代码中data变量的is_xml属性）。你知道吗

通过将所需元素名称的列表传递给find_all()方法，可以避免在尝试访问不存在的元素时生成错误：

subset = ['results_reference','allocation','interventionModel','primaryPurpose','masking','enrollment','eligibility','official_title','arm_group','condition']

tag_matches = data.find_all(subset)

然后，如果要从标记列表中获取特定元素而不进行迭代，可以使用标记名作为键将其转换为dict：

tag_dict = dict((tag_matches[i].name, tag_matches[i]) for i in range(0, len(tag_matches)))

相关问题更多 >

编程相关推荐

热门问题

热门文章