解析Python中beautyshoup后面的脚本标记

2024-09-28 21:50:05 发布

您现在位置:Python中文网/ 问答频道 /正文

各位论坛成员大家好。 在分析页面时,我遇到了从标记脚本提取数据的问题。标记的内部内容不是json对象。使用网络驱动程序结果没有。 谁面对这样的事情?我请求你的帮助。在

代码示例:

<script>window.ShopifyAnalytics = window.ShopifyAnalytics || {}; window.ShopifyAnalytics.meta = window.ShopifyAnalytics.meta || {}; window.ShopifyAnalytics.meta.currency = 'AUD'; var meta = {"product":{"id":8993669708,"vendor":"Womanizer","type":"Vibrators","variants":[{"id":31066737740,"price":14999,"name":"Womanizer - Black","public_title":"Black","sku":"172145678"},{"id":31066737804,"price":14999,"name":"Womanizer - Purple","public_title":"Purple","sku":"172146924"},{"id":31066737868,"price":14999,"name":"Womanizer - Pink","public_title":"Pink","sku":"172150324"},{"id":31066737996,"price":14999,"name":"Womanizer - Tattoo","public_title":"Tattoo","sku":"172205168"},{"id":1509908217881,"price":14999,"name":"Womanizer - Blue","public_title":"Blue","sku":"1725205076"}]},"page":{"pageType":"product","resourceType":"product","resourceId":8993669708}}; for (var attr in meta) { window.ShopifyAnalytics.meta[attr] = meta[attr]; }</script>


Tags: name标记idtitlevarscriptpublicproduct
1条回答
网友
1楼 · 发布于 2024-09-28 21:50:05

使用正则表达式。在

演示:

from bs4 import BeautifulSoup
import json
import re


s = """<script>window.ShopifyAnalytics = window.ShopifyAnalytics || {};
window.ShopifyAnalytics.meta = window.ShopifyAnalytics.meta || {};
window.ShopifyAnalytics.meta.currency = 'AUD';
var meta = {"product":{"id":8993669708,"vendor":"Womanizer","type":"Vibrators","variants":[{"id":31066737740,"price":14999,"name":"Womanizer - Black","public_title":"Black","sku":"172145678"},{"id":31066737804,"price":14999,"name":"Womanizer - Purple","public_title":"Purple","sku":"172146924"},{"id":31066737868,"price":14999,"name":"Womanizer - Pink","public_title":"Pink","sku":"172150324"},{"id":31066737996,"price":14999,"name":"Womanizer - Tattoo","public_title":"Tattoo","sku":"172205168"},{"id":1509908217881,"price":14999,"name":"Womanizer - Blue","public_title":"Blue","sku":"1725205076"}]},"page":{"pageType":"product","resourceType":"product","resourceId":8993669708}};
for (var attr in meta) {
  window.ShopifyAnalytics.meta[attr] = meta[attr];
}</script>"""

soup = BeautifulSoup(s, "html.parser")
scr = soup.find("script")
m = re.search(r"var meta = (.*?);", scr.string)
if m:
    data = json.loads(m.group(1))
    for sku in data["product"]["variants"]:
        print(sku["sku"])

输出:

^{pr2}$

相关问题 更多 >