如何避免将收集到的信息归为一项

2024-10-01 19:29:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我和斯帕奇收集的数据有问题。当我在终端中运行这段代码时,收集到的信息会全部附加到一个项目中,如下所示:

{"fax": ["Fax: 617-638-4905", "Fax: 925-969-1795", "Fax: 913-327-1491", "Fax: 507-281-0291", "Fax: 509-547-1265", "Fax: 310-437-0585"], 
"title": ["Challenges in Musculoskeletal Rehabilitation", "17th Annual Spring Conference on Pediatric Emergencies", "19th Annual Association of Professors of Human & Medical Genetics (APHMG) Workshop & Special Interest Groups Meetings", "2013 AMSSM 22nd Annual Meeting", "61st Annual Meeting of Pacific Coast Reproductive Society (PCRS)", "Contraceptive Technology Conference 25th Anniversary", "Mid-America Orthopaedic Association 2013 Meeting", "Pain Management", "Peripheral Vascular Access Ultrasound", "SAGES 2013 / ISLCRS 8th International Congress"],  ... ...

。。。等等

问题是每个字段的所有刮取信息最终都在一个项目中。我需要这些信息作为单独的项目显示出来。换言之,我需要每个标题与一个传真号码(如果有)和一个位置等相关。在

我不希望所有的信息都集中在一起,因为收集到的每一条信息都与其他信息有一定的关系。我最终希望它输入数据库的方式是:

“MedEconItem”1:[标题:“在此处插入标题1”,传真:“在此处插入传真1”,位置:“位置1”。。。]在

“MedEconItem”2:[标题:“标题2”,传真:“传真2”,位置:“位置2”。。。]在

“MedEconItem”3:[。。。等等

对如何解决这个问题有什么想法吗?有人知道如何容易地分离这些信息吗?这是我第一次和Scrapy合作,所以任何建议都是受欢迎的。我到处找遍了,似乎找不到答案。在

以下是我当前的代码:

^{pr2}$

Tags: of数据项目代码信息终端标题title
1条回答
网友
1楼 · 发布于 2024-10-01 19:29:46

好吧,下面的代码似乎可以工作,但遗憾的是,由于我对xpath的理解很差,因此涉及到一些明显的黑客攻击。更精通xpath的人稍后可能会提供更好的解决方案。在

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//html/body/div[@id="c"]/div[@id="meeting_results"]//a[contains(@href,"meetings")]')
       items = []
       for site in sites[1:-1]:  
           item = MedEconItem()
           item['title'] = site.select('./text()').extract()
           item['date'] = site.select('./following::p[@class = "dls"]/span[@class="date"]/text()').extract()[0]
           item['location'] = site.select('./following::p[@class = "dls"]/span[@class = "location"]/a/text()').extract()[0]
           item['specialty'] = site.select('./following::p[@class = "dls"]/span[@class = "specialties"]/text()').extract()[0]
           item['contact'] = site.select('./following::p[@class = "contact"]/text()').extract()[0]
           item['phone'] = site.select('./following::p[@class = "phone"]/text()').extract()[0]
           item['fax'] = site.select('./following::p[@class = "fax"]/text()').extract()[0]
           item['email'] = site.select('./following::p[@class = "email"]/text()').extract()[0]
           item['url'] = site.select('./following::p[@class = "website"]/a/@href').extract()[0]
           items.append(item)
       return items

相关问题 更多 >

    热门问题