如何在BeautifulSoup中解析Json Html

2024-10-01 00:21:15 发布

您现在位置:Python中文网/ 问答频道 /正文

目前我有以下几点

<script>window.__NUXT__=(function(a,b,c,d,e,f,g,h,i){i.date="2020-11-05 09:22:56.000000";i.timezone_type=d;i.timezone="UTC";return {layout:"default",data:[{}],error:a,state:{languages:{text:{},javascript:{mime:"text\u002Fjavascript"},css:{mime:"text\u002Fcss"},html:{directory:"htmlmixed",mime:"text\u002Fhtml"},vue:{directory:"vue",mime:"text\u002Fx-vue"},php:{directory:"php",mime:"application\u002Fx-httpd-php"},c:{directory:b,mime:"text\u002Fx-c++src"},csharp:{directory:b,mime:"text\u002Fx-csharp"},java:{directory:b,mime:"text\u002Fx-java"},lua:{directory:"lua",mime:"text\u002Fx-lua"},golang:{directory:"go",mime:"text\u002Fx-go"},dockerfile:{directory:"dockerfile",mime:"text\u002Fx-dockerfile"}},pastes:{liWq2S3:{status:c,id:e,title:f,paste:g,views:d,syntax:a,size:h,created_at:i}},paste:{title:f,status:c,id:e,paste:g,views:d,syntax:a,size:h,created_at:i}},serverRendered:c}}(null,"clike",true,3,"liWq2S3","Soup","Apple\nOrange\nCake\nPizza",23,{}));</script><script src="/assets/2135194c1f343036c318.js" defer></script><script src="/assets/fbb38f3d2f4d64c9376c.js" defer></script><script src="/assets/ad6678a738ac39c210fc.js" defer></script><script src="/assets/a93fe408ab9aa8104b17.js" defer></script><script src="/assets/f78a487814d7850007a6.js" defer></script>

我想解析标题(Soup)和描述(Apple\nOrange\nCake\nPizza)但我似乎找不到任何资源来帮助我陷入困境一段时间 I stumbled upon this but still can't find the solution for myself

我的代码

import requests
from bs4 import BeautifulSoup
import json

ExampleSite = "https://throwbin.io/liWq2S3"

r1 = requests.get(ExampleSite)
r1text = r1.text

soup = BeautifulSoup(r1text,features="html.parser")

ParsedSoup = soup.findAll('script')[1]

print (ParsedSoup)

Tags: textdockerfilesrcjsscriptdeferdirectorypaste
1条回答
网友
1楼 · 发布于 2024-10-01 00:21:15

这在我看来不像是一个JSON,但我可能错了。但是,您可以使用regex将这些内容导出

import re

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://throwbin.io/liWq2S3").text, features="html.parser")
bins_contents = re.search(r"}\((.*)\)\);", str(soup), re.S).group(1)
print(bins_contents.replace('"', "").split(",")[4:6])

输出:

['Soup', 'Apple\\nOrange\\nCake\\nPizza']

相关问题 更多 >