关于从网站提取数据（Python）

2024-06-13 18:56:04 发布

男 | 程序猿一只，喜欢编程写python代码。

我做了一个从网站上提取信息的程序。其工作原理如下：

for row in table.findAll('td'):
    topas = row.find('p')
    pastoo = row.find('ul')
    if topas:
        continue
    elif pastoo:
        continue
    else:
        input = row.get_text()
        input.strip()
        file.write(input)
        file.write("~") #adding separator

当.html文件格式良好时，它可以完美地工作，如下所示：

<table class="responsiveTable">
    <tbody>
        <tr><td>Country:</td><td>Belgium</td></tr>
        <tr><td>Year:</td><td>various years</td></tr>
    </tbody>
</table>

但是，在某些.html文件中，情况非常混乱，如下所示：

<table class="responsiveTable">
<tbody><tr><td>Country:</td><td>Indonesia</td></tr>
**<tr><td>Year:</td><td>2017 (Jan 27th)             
</td></tr>**
</tbody></table>

如您所见，代码的第4行出现了不必要的换行符。我试图使用.strip（）删除它，但没有成功。有没有强大的功能可以消除断线？？谢谢您！！你知道吗

Tags： input html table find tr class file write

0条回答

目前没有回答

关于从网站提取数据（Python）

相关问题更多 >

编程相关推荐

热门问题

热门文章

关于从网站提取数据（Python）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >