用div刮取特定的表，用scrapy保存文本

<tr> <td><div>2018/2058</div></td> <td class="address"><div>Land North of 37 and 39 Hare Lane Claygate Esher Surrey KT10 9BT</div></td> <td class="proposal"><div>Confirmation of Compliance with Conditions: 5 (Tree Protection and Pre-Commencement Inspection) and 6 (Tree Protection) of planning permission 2017/0451.</div></td> <td><div style="min-width:90px">Claygate Ward</div></td> </tr>

3条回答

网友

1楼 · 编辑于 2024-10-03 21:31:35

使用gangabass中的xpath：

import scrapy

class txt_filter:
     txt= '<tr>\
                     <td><div>2018/2058</div></td>\
                     <td class="address"><div>Land North of 37 and 39 Hare Lane Claygate Esher Surrey KT10 9BT</div></td>\
                     <td class="proposal"><div>Confirmation of Compliance with Conditions: 6 (Tree Protection and Pre-Commencement Inspection) and 6 (Tree Protection) of planning permission 2017/0451.</div></td>\
                     <td><div style="min-width:90px">Claygate Ward</div></td>\
                </tr>'
     resp = scrapy.http.response.text.TextResponse(body=txt,url='abc',encoding='utf-8')
     print(resp.xpath('//tr[1]/td/div/text()').extract())

只从td中删除了[1]以获取所有行。在

网友

2楼 · 编辑于 2024-10-03 21:31:35

你可以很容易地使用熊猫。在

table = pd.read_html(url)

现在表是一个包含完整表的数据帧

网友

3楼 · 编辑于 2024-10-03 21:31:35

first_td_text = response.xpath('//tr[1]/td[1]/div/text()').extract_first()

更新

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章