BS4找不到tex

import urllib2 from bs4 import BeautifulSoup def info(novelname): user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7' url = "https://m.wuxiaworld.co/"+novelname+"/" headers={'User-Agent':user_agent,'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding': 'none', 'Accept-Language': 'en-US,en;q=0.8', 'Connection': 'keep-alive'} request = urllib2.Request(url, headers=headers) response = urllib2.urlopen(request) soup = BeautifulSoup(response, features="html.parser") for textp in soup.find_all("p", class_="review"): print textp.contents print textp print textp.getText()

2条回答

网友

1楼 · 编辑于 2024-09-27 19:26:57

当你打印你的汤，你会看到一些html标签在终端（不是所有的源代码）。我认为网站隐藏了一部分数据。所以呢我建议使用硒。如果您尚未下载，可以安装在：

https://chromedriver.storage.googleapis.com/index.html?path=2.35/

所有代码：

from selenium import webdriver

driver_path = r'your driver path'
browser = webdriver.Chrome(executable_path=driver_path)


browser.get("https://m.wuxiaworld.co/Castle-of-Black-Iron/")

x = browser.find_elements_by_css_selector("p[class='review']") ## Declare which class
for text1 in x:
    print text1.text
browser.close()

输出：

Description After the Catastrophe, every rule in the world was rewritten. In the Age of Black Iron, steel, iron, steam engines and fighting force became the crux in which human beings depended on to survive. A commoner boy by the name Zhang Tie was selected by the gods of fortune and was gifted a small tree which could constantly produce various marvelous fruits. At the same time, Zhang Tie was thrown into the flames of war, a three-hundred-year war between the humans and monsters on the vacant continent. Using crystals to tap into the potentials of the human body, one must cultivate to become stronger. The thrilling legends of mysterious clans, secrets of Oriental fantasies, numerous treasures and legacies in the underground world — All in the Castle of Black Iron! Citadel of Black Iron 黑铁之堡

网友

2楼 · 编辑于 2024-09-27 19:26:57

import requests
from bs4 import BeautifulSoup
from collections import OrderedDict

def info(novelname):        
    response = requests.get(
        'https://m.wuxiaworld.co/{}/'.format(novelname.replace(' ', '-')),
        headers=OrderedDict(
            (
                ("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7"),
                ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),
                ("Accept-Language", "en-US,en;q=0.5"),
                ("Accept-Encoding", "gzip, deflate"),
                ("Connection", "keep-alive"), 
                ("Upgrade-Insecure-Requests", "1")
            )
        )
    )

    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html5lib')

        for textp in soup.find_all('p', attrs={'class': 'review'}):
            print textp.text.strip()

info('Castle of Black Iron')

问题是你的html解析器。。。使用html5lib

Description

After the Catastrophe, every rule in the world was rewritten.

In the Age of Black Iron, steel, iron, steam engines and fighting force became the crux in which human beings depended on to survive.

A commoner boy by the name Zhang Tie was selected by the gods of fortune and was gifted a small tree which could constantly produce various marvelous fruits. At the same time, Zhang Tie was thrown into the flames of war, a three-hundred-year war between the humans and monsters on the vacant continent. Using crystals to tap into the potentials of the human body, one must cultivate to become stronger.

The thrilling legends of mysterious clans, secrets of Oriental fantasies, numerous treasures and legacies in the underground world — All in the Castle of Black Iron!

Citadel of Black Iron
黑铁之堡

相关问题更多 >

编程相关推荐

热门问题

热门文章