如何在Python中使用BeautifulSoup保存对HTML文件所做的更改？

import os import re from bs4 import BeautifulSoup htmlDoc = open('adding_computer_c.html',"r+") soup = BeautifulSoup(htmlDoc) replacements= [ ('_', '-'), ('../tasks/', prefixUrl), ('../concepts/', prefixUrl) ] for link in soup.findAll('a', attrs={'href': re.compile("../")}): newlink=str(link) for k, v in replacements: newlink = newlink.replace(k, v) extrachars=newlink[newlink.find("."):newlink.find(">")] newlink=newlink.replace(extrachars,'') link=newlink print(link) ##How do I save the link I have modified back to the HTML file? print(soup)##prints the original html tree htmlDoc.close()

1条回答

网友

1楼 · 发布于 2024-05-19 08:35:46

newlink = link['href']
# .. make replacements
link['href'] = newlink # store it back

现在print(soup.prettify())将显示更改的链接。要保存对文件的更改，请执行以下操作：

htmlDoc.close()

html = soup.prettify("utf-8")
with open("output.html", "wb") as file:
    file.write(html)

要保留文档的原始字符编码，可以使用soup.original_encoding，而不是“utf-8”。见Encodings。

相关问题更多 >

编程相关推荐

热门问题

热门文章