Python BS4删除所有div ID的类、样式

<div class="applinks"> <div class="appbuttons"> <a href="https://geo.itunes.apple.com/ru/app/cloud-hub-file-manager-document/id972238010?mt=8&at=11l3Ss" rel="nofollow" target="_blank" title="Cloud Hub - File Manager, Document Reader, Clouds Browser and Download Manager">Загрузить</a> <span onmouseout="jQuery('.wpappbox-8429dd98d1602dec9a9fc989204dbf7c .qrcode').hide();" onmouseover="jQuery('.wpappbox-8429dd98d1602dec9a9fc989204dbf7c .qrcode').show();">QR-Code</span> </div> </div>

# coding: utf-8 import requests from bs4 import BeautifulSoup url = "https://lifehacker.ru/2016/08/29/app-store-29-august-2016/" r = requests.get(url) soup = BeautifulSoup(r.content) post_content = soup.find("div", {"class","post-content"}) print post_content

2条回答

网友

1楼 · 编辑于 2024-07-04 05:55:23

import requests
from bs4 import BeautifulSoup


url = "https://lifehacker.ru/2016/08/29/app-store-29-august-2016/"
r = requests.get(url)
soup = BeautifulSoup(r.content)
for tag in soup():
    for attribute in ["class"]: # You can also add id,style,etc in the list
        del tag[attribute]

网友

2楼 · 编辑于 2024-07-04 05:55:23

要从报废数据中的标记中删除所有属性，请执行以下操作：

import requests
from bs4 import BeautifulSoup

def CleanSoup(content):
    for tags in content.findAll(True): 
        tags.attrs = {}
    return content


url = "https://lifehacker.ru/2016/08/29/app-store-29-august-2016/"
r = requests.get(url)
soup = BeautifulSoup(r.content,"html.parser")
post_content = soup.find("div", {"class","post-content"})
post_content = CleanSoup(post_content)

相关问题更多 >

编程相关推荐

热门问题

热门文章