爬行表有刮擦，网站有不寻常的html代码。

import scrapy class CfpspiderSpider(scrapy.Spider): name = 'cfpspider' allowed_domains = ['http://www.wikicfp.com'] start_urls = ['http://www.wikicfp.com/cfp/call?conference=machine%20learning'] def parse(self, response): div = response.css("div.contsec") for table in div: print(table.css("table")[3].css.extract_first())

1条回答

网友

1楼 · 发布于 2024-05-18 13:56:36

从源代码的外观来看，页面的结构如下所示：

div class="contsec"
| center
| | form
| | | table
| | | | tr
| | | | tr
| | | | tr
| | | | | td
| | | | | | table id="the droids you are looking for"
| | | | tr

编辑：试试这个

def parse(self, response):
    divs = response.css("div.contsec")
    for div in divs:
            table = div.css("table")[3]
            headers = table.css("tr")[0].css("td::text").extract()
            # print("<table headers>")
            print("\t".join(headers))
            # print("</table headers>")
            for row in table.css("tr")[1:]:
                    row_data = row.css("td::text").extract()
                    print("\t".join(row_data))

相关问题更多 >

编程相关推荐

热门问题

热门文章

爬行表有刮擦，网站有不寻常的html代码。

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >