清除DataFram中的CSS样式块

2024-09-30 22:24:59 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个df，有一些记录如下：

Untitledp { margin-top: 0px;margin-bottom: 0px;line-height: 1.15; } body { font-family: 'Times New Roman';font-style: Normal;font-weight: normal;font-size: 13.3333333333333px; } .Normal { telerik-style-type: paragraph;telerik-style-name: Normal;border-collapse: collapse; } .TableNormal { telerik-style-type: table;telerik-style-name: TableNormal;border-collapse: collapse; } .s_F0039783 { telerik-style-type: local;font-size: 13.34px; } .s_45EBF2E0 { telerik-style-type: local;font-family: 'Times New Roman';font-size: 13.3333333333333px;color: #000000; } A sentence that I actually want.

我想删除CSS样式块，只返回结尾的句子。每个记录的css块的数量可以不同。所有记录都以“Untitledp”开头，以我想要的文本结尾（文本后面没有样式块）。你知道吗

我该怎么清理这些积木？我使用BeautifulSoup清理html标记，但它不适用于这些块。你知道吗

Tags： margin new size style type 记录 family normal

1条回答

网友

1楼 · 发布于 2024-09-30 22:24:59

正则表达式可用于此，带有sub()：

regex = re.compile('.+\s*{.*}')
regex.sub('', s) # s is copy paste of your sample
' A sentence that I actually want.'

至少在这个例子中是有效的。不过，要小心，如果你想得到的句子中有{}，这将失败。然而，由于句子通常不包含这些字符。。。你知道吗

清除DataFram中的CSS样式块

相关问题更多 >

编程相关推荐

热门问题

热门文章

清除DataFram中的CSS样式块

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >