用靓汤拉屎的时候会有奇怪的人物

2条回答

网友

1楼 · 编辑于 2024-05-19 18:19:15

所以结果是excel造成的。当我保存到CSV并在excel中打开时，我得到了奇怪的结果。在

为了防止这种情况，我使用了df.to_csv('df.csv', index=False, encoding = 'utf-8-sig')。指定编码可以消除奇怪的字符。在

Python Writing Weird Unicode to CSV有一些关于编码和excel如何穿透csv文件的信息。在

网友

2楼 · 编辑于 2024-05-19 18:19:15

使用这段代码下载网页中的可见内容。只需在网页上输入网址

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request
import os


page_url = "URL Here"
def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True


def text_from_html(body):
    soup = BeautifulSoup(body, 'html.parser')
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)
    return u" ".join(t.strip() for t in visible_texts)

def Extract_Text(html_bytes, url):
    text_data = text_from_html(html_bytes)
    f = open("DOC.txt", "w")
    string = str(url) + "\n" + text_data
    f.write(str(string))
    f.close()

html_string = ''
response = urlopen(page_url)
if 'text/html' in response.getheader('Content-Type'):
    html_bytes = response.read()
    html_string = html_bytes.decode("utf-8")
Extract_Text(html_bytes, page_url)

相关问题更多 >

编程相关推荐

热门问题

热门文章

用靓汤拉屎的时候会有奇怪的人物

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >